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Abstract 

A relational database is said to be uncertain if primary key constraints can possibly be violated. A repair 
(or possible world) of an uncertain database is obtained by selecting a maximal number of tuples without ever 
selecting two distinct tuples with the same primary key value. For any Boolean query q, CERTAINTY (q) is the 
problem that takes an uncertain database db on input, and asks whether q is true in every repair of db. The 
complexity of this problem has been particularly studied for q ranging over the class of self-join-free Boolean 
conjunctive queries. A research challenge is to determine, given q, whether CERTAINTY(g) belongs to com¬ 
plexity classes FO, P, or coNP-complete. In this paper, we combine existing techniques for studying the 
above complexity classification task. We show that for any self-join-free Boolean conjunctive query q, it can be 
decided whether or not CERTAINTY (q) is in FO. Further, for any self-join-free Boolean conjunctive query q, 
CERTAINTY(g) is either in P or coNP-complete, and the complexity dichotomy is effective. This settles a 
research question that has been open for ten years, since ©. 


1 Introduction 

Primary key violations provide an elementary means for capturing uncertainty in the relational data model. A 
block is a maximal set of tuples of the same relation that agree on the primary key of the relation. Tuples in the 
same block are mutually exclusive: exactly one tuple is true, but we are uncertain about which one. We will refer 
to databases as “uncertain databases” to stress that they can violate primary key constraints. 

A repair (or possible world) of an uncertain database is obtained by selecting exactly one tuple from each block. In 
general, the number of repairs of an uncertain database can be exponential in its size. For instance, if an uncertain 
database contains n blocks with two tuples each, then it contains 2 n tuples and has 2" repairs. 

There are two natural semantics for answering Boolean queries q on an uncertain database. Under the possibility 
semantics, the question is whether the query evaluates to true on some repair. Under the certainty semantics, 
which is adopted in this paper, the question is whether the query evaluates to true on every repair. The certainty 
semantics adheres to the paradigm of consistent query answering mia, which introduces the notion of database 
repairs with respect to general integrity constraints. In this work, repairing is exclusively with respect to primary 
key constraints, one per relation. 

For any Boolean query q, the decision problem CERTAINTY (q) is the following. 


Problem: 

CERTAINTY(g) 

Input: 

uncertain database db 

Question: 

Does every repair of db satisfy ql 


Three comments are in place. First, the Boolean query q is not part of the input. Every Boolean query q gives 
thus rise to a new problem. Since the input to CERTAINTY(r/) is an uncertain database, we consider the data 
complexity of the problem. Second, we will assume that every relation name in q or db has a fixed known arity 
and primary key. The primary key constraints are thus implicitly present in all problems. Third, all the complexity 
results obtained in this paper can be carried over to non-Boolean queries; the restriction to Boolean queries eases 
the technical treatment, but is not fundamental. 
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The complexity of CERTAINTY)/) has gained considerable research attention in recent years, especially for / 
ranging over the set of self-join-free conjunctive queries. A challenging question is to distinguish queries q for 
which the problem CERTAINTY(/) is tractable from queries for which the problem is intractable. Further, if 
CERTAINTY)/) is tractable, one may ask whether it is first-order expressible. We will refer to these questions as 
the complexity classification task o/CERTAI NTY)/). 

In the past decade, a variety of tools and techniques have been used in the complexity classification task of 
CERTAINTY)/) for self-join-free conjunctive queries q. In their pioneering work, Fuxman and Miller J9) intro¬ 
duced the notion of join graph (not to be confused with the classical notion of join tree). Later on, Wijsen M 
introduced the notion of attack graph. Kolaitis and Pema Go) applied Minty’s algorithm |fl3l to the task. Koutris 
and Suciu GD introduced the notion of query graph and the distinction between consistent and possibly inconsis¬ 
tent relations. All these techniques have limited applicability: join graphs seem too rudimentary to obtain general 
complexity dichotomies; attack graphs enable to characterize first-order expressibility of CERTAINTY(7/), but 
only for acyclic (in the sense of 0) queries /; Minty’s algorithm has been used to establish a P-coNP-complete 
dichotomy in the complexity of CERTAINTY)/), but only for queries q with exactly two atoms; the framework 
of Koutris and Suciu has also resulted in a P-coNP-complete dichotomy, but only when all primary keys consist 
of a single attribute. On top of the limited applicability of each individual technique, there is the difficulty that 
complexity classifications expressed in terms of different techniques cannot be easily compared. 

In this paper, we make significant progress in the complexity classification task of CERTAINTY(/) for q rang¬ 
ing over the set of self-join-free conjunctive queries, by establishing the following effective complexity tri¬ 
chotomy: 

• Given a self-join-free Boolean conjunctive query q, it is decidable whether CERTAINTY)/) is in FO. 
In GD, this was only shown under the assumption that queries are acyclic (in the sense of 0). 

• Given a self-join-free Boolean conjunctive query /, if CERTAINTY(/) is not in FO, then it is L-hard. In 
previous works mm, Hanf locality was used to show first-order inexpressibility, resulting in involved 
proofs. The current paper takes a complexity-theoretic approach to first-order inexpressibility, which results 
in an easier proof of a stronger result. 

• For every self-join-free Boolean conjunctive query, CERTAI NTY)/) is either in P or coNP-complete, and 
the dichotomy is effective. In GD. this was only shown under the assumption that all primary keys are 
simple (i.e., consist of a single attribute). 

The established complexity trichotomy solves a problem that has been open since 2005 j9). 


Organization This paper is organized as follows. Section [2] discusses related work. Section [3] introduces our 
data and query model. Section[4]defines attack graphs for Boolean conjunctive queries, extending an older notion 
of attack graph m that was defined exclusively for acyclic Boolean conjunctive queries. The section also states 
the main result of the paper, Theorem[2] Section [5] establishes an effective procedure that takes in a self-join-free 
Boolean conjunctive query q, and decides whether CERTAINTY)/) is in FO. Section [6] provides a sufficient 
condition for coNP-hardness of CERTAINTY)/), for any self-join-free Boolean conjunctive query q. Section]?] 
shows that if the condition is not satisfied, then CERTAINTY(/) is in P. The appendix contains the proofs of 
some non-trivial results. 


2 Related Work 

Consistent query answering (CQA) goes back to the seminal work by Arenas, Bertossi, and Chomicki 0. Fuxman 
and Miller |9) were the first ones to focus on CQA under the restrictions that consistency is only with respect to 
primary keys and that queries are self-join-free conjunctive. The term CERTAINTY!//) was coined in ||I4| . A 
recent and comprehensive survey on CERTAINTY)/) is fl8l . 

Little is known about CERTAINTY)/) beyond self-join-free conjunctive queries. An interesting recent result 
by Fontaine 0 goes as follows. Let UCQ be the class of Boolean first-order queries that can be expressed 
as disjunctions of Boolean conjunctive queries (possibly with constants and self-joins). A daring conjecture is 
that for every query q in UCQ, CERTAINTY)/) is either in P or coNP-complete. Fontaine showed that this 
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conjecture implies Bulatov’s dichotomy theorem for conservative CSP E), the proof of which is highly involved 
(the full paper contains 66 pages). 


3 Preliminaries 


We assume disjoint sets of variables and constants. If x is a sequence containing variables and constants, then 
vars(x) denotes the set of variables that occur in x. A valuation over a set U of variables is a total mapping 9 from 
U to the set of constants. At several places, it is implicitly understood that such a valuation 9 is extended to be the 
identity on constants and on variables not in U. If V C U , then 0[V] denotes the restriction of 9 to V. 

If 9 is a valuation over a set U of variables, x is a variable, and a is a constant, then 9 r xl _>. a ] is the valuation over 
U U {a;} such that 9t xh ^ a ] ( x ) = a and for every variable y such that y x, 6\ x ^ a -\ (y) = 9(y). Notice that x G U 
is allowed. 


Atoms and key-equal facts Each relation name R of arity n, n > 1, has a unique primary key which is a set 
{1,2,..., k} where 1 < k < n. We say that R has signature [n, k] if R has arity n and primary key {1,2,..., k}. 
We say that R is simple-key if k = 1. Elements of the primary key are called primary-key positions, while k + 1, 
k + 2, ..., n are non-primary-key positions. For all positive integers n, k such that 1 < k < n, we assume 
denumerably many relation names with signature [n , k\. 

If R is a relation name with signature [ n , k], then R(s i,..., s n ) is called an R-atom (or simply atom), where each 
Si is either a constant or a variable (1 < i < n). Such an atom is commonly written as R(x , y) where the primary 
key value x = Si,..., s k is underlined and y = Sfc+i>..., s n . An R-fact (or simply fact) is an /(-atom in which 
no variable occurs. Two facts i?i(ai, &i), f? 2 (a 2 , (> 2 ) are key-equal if R\ = R 2 and a\ = 02 - An /(-atom or an 
/(-fact is simple-key if R is simple-key. 

We will use letters F,G,H for atoms. For an atom F = R(x, y), we denote by key(f’) the set of variables 
that occur in x, and by vars (F) the set of variables that occur in F, that is, key(E’) = vars(T) and vars(E’) = 
vars(T) U vars (y). 


Uncertain database, blocks, and repairs A database schema is a finite set of relation names. All constructs 
that follow are defined relative to a fixed database schema. 

An uncertain database is a finite set db of facts using only the relation names of the schema. We refer to databases 
as “uncertain databases” to stress that such databases can violate primary key constraints. 

We write adom(db) for the active domain of db (i.e., the set of constants that occur in db). A block of db is 
a maximal set of key-equal facts of db. The term /(-block refers to a block of /(-facts, i.e., facts with relation 
name R. If A is a fact of db, then block(A, db) denotes the block of db that contains A. An uncertain database 
db is consistent if no two distinct facts are key-equal (i.e., if every block of db is a singleton). A repair of db is 
a maximal (with respect to set containment) consistent subset of db. We write rset(db) for the set of repairs of 
db. 


Boolean conjunctive queries A Boolean query is a mapping q that associates a Boolean (true or false) to each 
uncertain database, such that q is closed under isomorphism G2- We write db |= q to denote that q associates 
true to db, in which case db is said to satisfy q. A Boolean first-order query is a Boolean query that can be defined 
in first-order logic. A Boolean conjunctive query is a finite set q = {Ri(xi,yi), .... R n {x n , y n )} of atoms. We 
denote by vars(g) the set of variables that occur in q. The set q represents the first-order sentence 

3ui • • • 3u k (Ri{xi,yi) A • • • A R n (xn,y n )) , 

where {ui,..., u k } = vars (q). This query q is satisfied by uncertain database db if there exists a valuation 9 over 
vars(g) such that for each i £ {!,..., n}, Ri{a, b ) G db with a = 9{xt) and b = 9{yi). 
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We say that a Boolean conjunctive query q has a self-join if some relation name occurs more than once in q. If 
q has no self-join, then it is called self-join-free. By a little abuse of notation, we may confuse atoms with their 
relation names in a self-join-free Boolean conjunctive query q. That is, if we use a relation name R at places 
where an atom is expected, then we mean the (unique) f?-atom of q. 

If q is a Boolean conjunctive query, x = {xi, .... x?) is a sequence of distinct variables that occur in q, and 
a = (ai,..., at) is a sequence of constants, then qig^s] denotes the query obtained from q by replacing all 
occurrences of Xi with a,, for all 1 < i < L 


Typed uncertain databases For every variable x, we assume an infinite set of constants, denoted type 0*0> such 
that x y implies type(ir) D type(y) = 0. Let q be a self-join-free Boolean conjunctive query, and let db 
be an uncertain database. We say that db is typed relative to q if for every atom R(x \...., x n ) in q, for every 
i £ {1,..., n}, if Xi is a variable, then for every fact , a n ) in db, ai £ type(a/) and the constant ai does 

not occur in q. Significantly, since q is self-join-free, the assumption that uncertain databases are typed is without 
loss of generality. 


Purified uncertain databases Let q be a Boolean conjunctive query, and let db be an uncertain database. We 
say that a fact A £ db is relevant for q in db if for some valuation 9 over vars(g), A £ 9{q) C db. We say that 
db is purified relative to q if every fact A £ db is relevant for q in db. 


Frugal repairs For every uncertain database db. Boolean conjunctive query q, and X C vars(g), we define a 
preorder on rset(db), as follows. For every two repairs ri, r 2 , we define ri fijj r 2 if for every valuation 9 
over X , r-| |= 9(q) implies r 2 |= 9{q). Here, 9(q) is the query obtained from q by replacing all occurrences of each 
x £ X with 9{x)\ variables not in X remain unaffected (i.e., 9 is understood to be the identity on variables not in 
X). Clearly, A* is a preorder (i.e., it is reflexive and transitive), and its minimal elements are called A*-frugal 
repairs Q 


Functional dependencies Let q be a Boolean conjunctive query. Afunctional dependency for q is an expression 
X —> Y where X. Y C vars(q). We say that an uncertain database db satisfies X —>• Y for q, denoted db lh, ; 
X — > Y, if for all valuations 9,p. over vars(<7) such that 9{q) 1 p,{q) C db, if 9[X\ = p[X], then 9[Y] = 

/'in 

Example 1 The relation R shown next does not satisfy the standard functional dependency 2 — >■ 3, because its 
tuples agree on the second position, but disagree on the third position. Nevertheless, for q - 3y3zR(a, y. z), we 
have R lh, ; y -g z. The second tuple of R is not relevant for the query, because a and d are distinct constants; the 
relation R' is purified relative to q. 


R 

1 

2 

3 


a 

b 

c 


d 

b 

f 


R' 

I 2 

3 


a b 

c 


< 


Certain query answering For every Boolean conjunctive query q, the decision problem CERTAINTY (q) takes 
on input an uncertain database db, and asks whether q is satisfied by every repair of db. 

It is easy to show the following upper bound on the complexity of CERTAI NTY(gr). 

Theorem 1 For every Boolean first-order query q, CERTAI NTY ((/) is in coNP. 

The following two lemmas are useful in the study of the complexity of CERTAINTY///). 

Lemma 1 I lfl7l ) Let q be a Boolean conjunctive query. Let db be an uncertain database. It is possible to compute 
in polynomial time an uncertain database db ; that is purified relative to q such that every repair of db satisfies q 
if and only if every repair of dh' satisfies q. 

r i is minimal if for all r 2 . if r 2 df ri then ri Xf r 2 . 
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R(x, y ) 



S{y,z) 


V(x, u, v) 

Figure 1: Attack graph of the query in Example [2] 


Lemma 2 Let q be a self-join-free Boolean conjunctive query, and X C vars (q). Let db be an uncertain data¬ 
base. Then, every repair of db satisfies q if and only if every -frugal repair of db satisfies q. 


4 Attack Graphs 


Attack graphs were introduced in fl4) for studying first-order expressibility of CERTAINTY^) for acyclic (in 
the sense of J4]) self-join-free conjunctive queries q. Here, we extend the notion of attack graph to all (cyclic or 
acyclic) self-join-free conjunctive queries. 

Let q be a self-join-free Boolean conjunctive query. We define £.((/) as the following set of functional dependen¬ 
cies: 

K,{q) := {key(F) —y vars(F) | F £ q} 

For every atom F £ q, we define F +,q and F^-' 1 as the following sets of variables. 

F +,q := j x g vars(q) | IC(q \ {F}) |= key (F) -> cc} 

._ jj, g vars(g) | JC(q) \= key (F) —► x} 


The attack graph of q is a directed graph whose vertices are the atoms of q. There is a directed edge from F to G 
(F f G ) if there exists a sequence 

Fq. F-[..... F n (1) 


of (not necessarily distinct) atoms of q such that 


• F 0 = F and F„ = G; and 

• for allf £ {0,..., n — 1}, vars(Fi) n vars(F i+ i) ^ F +,q . 

A directed edge from F to G in the attack graph of q is also called an attack from F to G, denoted by F G. 
The sequence |l]) is called a witness for the attack F G. We will often add variables to a witness: if we write 

21 Z 2 

^0 /~s Fi f 2 .. F n , then it is understood that for i £ {1,..., n}, £ vars(F i _ 1 )nvars(F i ) and z r ; F 0 +,q . 

If F G, then we also say that F attacks G (or that G is attacked by F). 

An attack from F to G is called weak if IC(q) |= key(F) —> key(G); otherwise it is strong. A directed cycle in the 
attack graph of of q is called weak if all attacks in the cycle are weak; otherwise the cycle is called strong. 


Example 2 Let q = { R{x , y), S(y, z), T(z, x ), U(x, u), V(x, u,v)}. By a little abuse of notation, we denote 
each atom by its relation name (e.g., R is used to denote the atom R(x, y)). We have R +,q = {a;, u,v}. A witness 
q V z n 

for R T is R ^ S ^ T. The complete attack graph is shown in Fig. [1J All attacks are weak. < 


The above notion of attack graph is purely syntactic. Semantically, an attack from an f?-atom to an S'-atom in the 
attack graph of q means that there exists an uncertain database db such that every repair of db satisfies q, and 
such that two /t’-facts of a same /i-block join exclusively with two S'-facts belonging to distinct S'-blocks. For 
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the query of Example[2] such a database could be db = {f?(l, a), i?(l, 6), S(a , a), S(b, f3),... }, in which the 
two f?-facts belong to the same f?-block, and R( 1, a) joins exclusively with S(a , a), and R( 1, b) joins exclusively 
with S(b,(3), and the two S'-facts belong to distinct S'-blocks. Therefore, the attack graph of Fig. [T] contains a 
directed edge from the f?-atom to the S'-atom. 

Equipped with the notion of attack graph, we can now present the effective complexity trichotomy in the set 
{ CERTAI NTY(g) | q is a self-join-free Boolean conjunctive query}. 

Theorem 2 (Trichotomy Theorem) Let q be a self-join-free Boolean conjunctive query. 

1. If the attack graph of q is acyclic, then CERTAI NTY(g) is in FO. 

2. If the attack graph ofq is cyclic but contains no strong cycle, then CERTAINTY ( q ) is in P and is L-hard. 

3. If the attack graph ofq contains a strong cycle, then CERTAI NTY(g) is coNP -complete. 

The rest of the paper presents the proof of Theorem[2] We first present some properties of attack graphs that will 
be useful in subsequent sections. 

Lemma 3 Let q be a self-join-free Boolean conjunctive query. If F G and G 77, then either F -i> H or 
G F (or both). 

Lemma 4 Let q be a self-join-free Boolean conjunctive query. 

1. If the attack graph ofq contains a cycle, then it contains a cycle of size two. 

2. If the attack graph ofq contains a strong cycle, then it contains a strong cycle of size two. 

Lemma 5 Let q be a self-join-free Boolean conjunctive query. Let x £ vars (q) and let a be an arbitrary constant. 

1. If the attack graph ofq is acyclic, then the attack graph ofq[ xh ^ a ] is acyclic. 

2. If the attack graph ofq contains no strong cycle, then the attack graph ofq[ xh + a ] contains no strong cycle. 

We conclude this section with three definitions. The following definition is taken from 0 and applies to directed 
graphs in general. 

Definition 1 A directed graph is strongly connected if there is a directed path from any vertex to any other. The 
maximal strongly connected subgraphs of a graph are vertex-disjoint and are called its strong components. If Si 
and S 2 are strong components such that an edge leads from a vertex in Si to a vertex in S 2 , then Si is a predecessor 
of S 2 and S 2 is a successor of Si. A strong component is called initial if it has no predecessor. <] 

Strong components in the attack graph should not be confused with strong attacks. 

Example 3 In the attack graph of Fig. [lj the atoms Fix, y ), S(y, z), and T(z, x) together constitute an initial 
strong component. <1 

So far we have defined an attack from an atom to another atom. The following definition introduces attacks from 
an atom to a variable. 

Definition 2 Let q be a self-join-free Boolean conjunctive query. Let R be a relation name with signature [1,1] 

such that R does not occur in q. For F £ q and z £ vars (q), we say that F attacks z, denoted F z, if F -w R(z) 
where q' = q U {f?(z)|. <1 

zi Zn q q 

Example 4 Clearly, if Fq ^ F-\ ... ^ F n is a witness for Fq -G F n , then F 0 Zi for every i £ {1,..., n}. 
Notice also that if q = { R(x , y)}, then the attack graph of q contains no edge, yet R-% y. <\ 

Finally, we introduce the notion of sequential proof , which mimics an algorithm for testing logical implication for 
functional dependencies 0 Algorithm 8.2.7]. 

Definition 3 Let q be a self-join free Boolean conjunctive query. Let X C varsfr/) and y £ vars(V/). A sequential 
proof of KL{q) f= X —> y is a sequence Hq, Hi, ..., Hg of atoms of q such that 

• y £ XU Uf = i vars (ifi); and 

• for i £ {0,..., £}, key (Hi) C XU |J*“o vars (Hj). 
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Notice that if y £ X, then the empty sequence is a sequential proof of IC(q) \= X —t y. 
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5 First-Order Expressibility 

In this section, we prove the first item in the statement of Theorem [2] as well as the t-hard lower complexity 
bound stated in the second item. 

Theorem 3 Let q be a self-join-free Boolean conjunctive query. Then the following are equivalent: 

1. CERTAINTY(g) is in FO; 

2. the attack graph of q is acyclic. 

That is, acyclicity of the attack graph of q is both a necessary and sufficient condition for first-order expressibility 
of CERTAINTY)//). In Section |5. 1 [ we show the contrapositive of the implication [l] =» [~2| In Section 
show the implication^ =>|~T| 

5.1 Necessary Condition 

Let q 0 = {-Rofe y)i So(U: tc)}. In 031 , it was shown that CERTAINTY(g 0 ) is not in FO. The following lemma 
shows a stronger result. 

Lemma 6 Let go = {Rn(x,y), So(y,x)}. Then CERTAINTY(</o) is L-hard. 

Lemma 7 Let q be a self-join-free Boolean conjunctive query. If the attack graph ofq is cyclic, then CERTAINTY(g) 
is L-hard (and hence not in FO). 

Proof Assume that the attack graph of q is cyclic. We show hereinafter that there exists a first-order many-one 
reduction from CERTAINTY(qo) to CERTAINTY(V/). The desired result then follows from Lemma[6] 

By Lemma[4] we can assume two distinct atoms F,G £ q such that F -w G ■%+ F is an attack cycle of size two. 
We will assume hereinafter that the relation names in F and G are R and S respectively. 

For all constants a, b we define the valuation ©£ over vars(g) as follows. Let 1 be a fixed constant not occurring 
elsewhere. For every variable u £ vars (q), 

1. if u £ F+- q \ G+’ q , then 0“(u) = a; 

2. if u £ G+’ q \ F+’«, then 0“(u) = 6; 

3. if u e F+- q (T G + ’ q , then 0^(u) = T; 

4. if u £ vars (q) \ (F + ’ q U G +,q ), then 0£(u) = (a, b). 

Sublemma 1 For all constants a, b , a', b', if H £ q \ {F, G}, then {Q^(H), 0^, ( H )} is consistent. 

Proof of Sublemmajl] Assume that for all u £ key(/7), = (-))), (u). We distinguish four cases. 

Case a = a' and b = b'. Then 0“(tf) = 0“'(F). 

Case a = a' and b ^ b'. Then key (IT) C F+’ q , hence vars(TT) C F+’ q . Then 0g(7T) = ©£,'(#). 

Case a ± a! and b = If. Then key(iT) C G+’«, hence vars(TT) C G+’ q . Then 0“(1T) = 0“'(iT). 

Case a ± a! and b ^ If. Then key(TT) C F+’?(TG+’«, hence vars(TT) C F+’ q r\G+’ q . Then 0“(iT) = 0“,'(iT). 

H 
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Sublemma 2 For all constants a, b , a ', b', 

1. 0jJ(F) and 0^, (F) are key-equal if and only if a = a 1 . 

2. 0jJ(F) = 0j, (F) if and only if a = a' and b = b'. 
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3. 0g(G) and 0^, (G) are key-equal if and only ifb = b'. 

4. 0£(G) = 0j, (G) if and only if a = a' and b = b'. 

Proof of Sublemma |2| 

>■ Consequence of key (F) <£. G +:<? (because G F). |T| 

> Consequence of vars(F) ^ F +,q (because F -%> G). O 


□ 


El 


Consequence of key (F) C F + ’ q . 
Trivial. 


The proof of the remaining items is analogous. 


For every uncertain database db with /(o-facts and ,Sij-facts, we define /(db) as the following uncertain data¬ 
base: 

1. for every R 0 (a, b) in db, /(db) contains 0 £(q \ {G}); and 

2. for every So(b, a) in db, /(db) contains 0 ^(q \ {F}). 

It is easy to see that / is computable in FO. 

In what follows, we assume that db is typed, as explained in Section [3] It will be understood that a. a \. a- 2 - ■ ■ ■ 
belong to type(cc), and that b, 6 1; & 2j • ■ • belong to type(y). 

Let us define g(db) as follows: 

fl(db) := /(db) \ (}0£(F) | R 0 (a, b ) £ db} U }0£(G) | S 0 (b, a) £ db}). 

That is, g(db) contains all facts of /(db) that are neither f?-facts nor //-facts. 

By Sublemmas [I] and [2] 

rset(/(db)) = {/(r) U g(db) | r £ rset(db)}. (2) 

Let db be an arbitrary database with / 1 ' 0 -facts and S'o-facts. It suffices to show that the following are equivalent 
for every repair r of db: 

1. r satisfies go; 

2. /(r) U <?(db) satisfies q. 


S Q iff. a') £ r such that 9(G) £ 0£, (q \ {F}). 

It suffices to show that a = a' and b = b'. 

Before giving the proof, we provide some intuition. For every fact A £ /(db), we can assume an atom in q, 
denoted Ha, such that A = Q^(Ha) for some constant a £ type(*) and some constant b £ type(y). Then, for 
all z £ vars(i/ 4 ), 0£ (z) £ {_L, a, b, (a, b}}. The constants in the latter set allow to “trace back” A to some facts 
Ro{a, b) or S 0 (b, a) in db. 

With this intuition in mind, it is easy to show b = b' (the proof of a = a' is symmetrical). Since F -w G, there 
exists a sequence Fo, ..., F n of atoms of q such that 

• F(| = F and F n = G; and 

• for all i £ {0,..., n — 1}, we can assume iq £ vars(Fi) D vars(Fj_|_i) such that m qL F +,q . 

We show by induction on increasing i that for all i £ {0,..., n — 1}, there exists constant a, such that for all 
Wi £ vars(Fi), we have 9{wi) £ {_L, at, b, ( o*, b)}. 

Basis i = 0. Since 9(F) £ Q%(q \ {G}), for all w 0 £ vars(F 0 ), we have 9(wq) £ {_L, a, b, (a, 6)}. 


1=^1 This is the easier part. 


El^ffl Let 9 be a substitution over vars(g) such that 9(q) C /(r) U g(db). 

By our construction, we can assume R 0 (a,b) £ r such that 9(F) £ 0%(q \ {G}). Likewise, we can assume 
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Step i — ► i + 1. By the induction hypothesis, there exists constant «, such that for all w t G vars(i 7 i), we have 
0{wi) G {-L, CLi, 6, (at , 6)}. 

From rii ^ F + ’ 9 , it follows that 0(iij) G {6, (a,i, 6)}. 

Since itj G vars(Fi + i), it follows that there exists constant a,; +] such that for all u>i+i G vars(Fi + i), we 
have (9(w i+ i) G {_L, a i+ i, b, (a i+1 ,b)}. 

It follows that for u n -\ G vars(G), there exists constant a n _i such that 0(u n - 1 ) G {6, (a ra _i,6)}. From 
9(G) G (g \ {F}), it follows 9(u n - 1 ) G {b 1 , (a', 6')}. Consequently, 6 = 6'. □ 


5.2 Sufficient Condition 

In this section, we show that CERTAINTY (q) is in FO if the attack graph of q is acyclic. 

Lemma 8 Let q be a self-join-free Boolean conjunctive query. Let F be an atom ofq such that in the attack graph 
of q, the indegree of F is zero. Let k = |key(F)| and let x = (xy.... ,Xk) be a sequence containing (exactly once) 
each variable ofkey(F). Then the following are equivalent for every uncertain database db: 

1. q is true in every repair of db; 

2. for some a G (adom(db)) fc , it is the case that qig^g] is t rue i fl every repair o/db. 

Lemma[8]immediately leads to the following result. 

Lemma 9 Let qbe a self-join-free Boolean conjunctive query. If the attack graph ofq is acyclic, then CERTAINTY ( q ) 
is in FO. 

Proof Assume that the attack graph of q is acyclic. 

The proof runs by induction on |g|. If |g| = 0, then CERTAINTY(q) is obviously in FO. 

Let db be an instance of CERTAINTY):/). Since the attack graph of q is acyclic, we can assume an atom l'(x, y) 
that is not attacked in the attack graph of q. By Lemma[8] the following are equivalent: 

1. q is true in every repair of db. 

2. For some fact R(a, 6) G db, there exists of a valuation 9 over vars(T) such that 9(x) = a and such that for 
all key-equal facts R(a , b') in db, the valuation 9 can be extended to a valuation 9 + over vars(x) U vars(y) 
such that 9 + (y) = b and 9 + (q') is true in every repair of db, where q' = q \ {R(x, y)}. 

From Lemma[5] it follows that the attack graph of 0 + (q') is acyclic, and hence CERTAINTY(6 + (//')) is in FO 
by the induction hypothesis. It is then clear that the latter condition (|2]i can be checked in FO. □ 


For a self-join-free Boolean conjunctive query q. the problem CERTAINTY(g) can be equivalently defined as 
the set containing every uncertain database db such that every repair of db satisfies q. If CERTAINTY(g) is in 
FO, then the set CERTAINTY(//) is definable in first-order logic (by definition of the complexity class FO). If 
CERTAINTY(g) is in FO, then its first-order definition is commonly called first-order rewriting. Such a first- 
order rewriting is actually an implementation, in first-order logic, of the algorithm in the proof of Lemma[9] This 
is illustrated next. 

Example 5 Let q = { H(x, y), S(y. 6)}, where 6 is a constant. The attack graph of q contains a single directed 
edge, from the f?-atom to the S'-atom. The first-order definition of CERTAINTY(g) is as follows: 

^ T~\n( T?(T ?/^A 

Vy (r£ y ) -f (S(y , 6) A Vz(S(y,z) ->■ * = 6)))) . 


< 


9 


vars(< 7 ) 



Figure 2: Help for the proof of Theorem[4] 


6 Intractability Result 


In this section, we prove the coNP-hard lower complexity bound stated in the third item of Theorem[2] 

Theorem 4 Let q be a self-join-free Boolean conjunctive query. If the attack graph of q contains a strong cycle, 
then CERTAINTY(g) is coNP -hard. 

Proof Assume that the attack graph of q contains a strong cycle. By Lemma[4] we can assume F,G £ q such that 
p If (J If F and the attack F -A G is strong. We will assume hereinafter that the relation names in F and G are 
R and S respectively. 

Let qi = y), S\(y, z, a;)}. We show hereinafter that there exists a polynomial-time (and even first-order) 

many-one reduction from CERTAINTY(< 7 i) to CERTAINTY(< 7 ). Since it is known fllOj that CERTAINTY(< 7 i) is 
coNP-hard, it follows that CERTAIIMTY(g) is coNP-hard. 

For all constants a, b, c, we define i-Yj c as the following valuation over vars(g) (see Fig. [2] for a mnemonic). Let _L 
be some fixed constant. 

1. If u € F+’ q n G+’f then 0“ c (u) = _L; 

2. if u e F+’i \ G+’ q , then 0^ c (u) = a; 

3. if u e G+’ q \ F m ’ q , then 0^ c (u) = {b, c); 

4. if u e (G+’ q n F 33 ^) \ F+’ q , then 0“ c (u) = 6; 

5. if u e F m - q \ (. F+’ q U G+’ q ), then Q^ c (u) = (a, b); and 

6. if utjL F m ' q U G+’ q , then 0g c (u) = (a, b , c). 

Sublemma 3 For all constants a, b , c, a', b', c!, if H £ q\{F, G}, then {0g c (iT), 0^, C ,(H)} is consistent. 
Proof of SublemmaJT] Assume that for all u € key(//), 

e^ c (u) = e^ c ,(u). (3) 


We distinguish four cases. 
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Case a = a' and 6 = b'. If c = d , then C (H) = BJj, c , (H). Assume next c d d. From ([3J1, it follows 
key {H)C F 1 ®’ 9 . Consequently, vars (if) C F 1 ®’ 9 . Since c does not occur inside F 1 ®’ 9 in the Venn diagram 
of Fig.j^we have 0£ c (Ff) = Q^ C ,(H). 

Case a = a' and b d b'. From |[3j, it follows key(//) C F +,q , hence vars (H) C F + q . Since b and c do not 
occur inside F +,q in the Venn diagram, 0£ C (H) = , ( H ). 

Case a d a' and b = b'. First assume c = d. From (|3j, it follows key(Ff) C G +:9 , hence vars (H) C G + ’ 9 . 
Since c does not occur inside G + ’ 9 in the Venn diagram, 0£ C (H) = 0g, c , (H). 

Next assume c d d. From (pO), it follows key(TL) C F®’ 9 D G + ’ 9 , hence vars (H) C F®’ 9 n G +,q . Since a 

and c do not occur inside F“’ 9 n G + ' 9 in the Venn diagram, 0£ C (H) = 0g( c , ( H ). 

Case 0 / a' and 6 7 ^ 6 '. From ([ 3 J, it follows key(Ff) C F + ’ 9 0 G' + ’ 9 , hence vars(iJ) C F + ’ q 0 G + ’ 9 . Since 

a, 6, c do not occur inside F + ’ q 0 G + ’ 9 in the Venn diagram, 0£ C (H) = 0£, C ,{H). 

H 


Sublemma 4 For all constants a, b , c, 0 /, // , d , 

1. 0& C (F) an d c-(^) key-equal iff a = a'. 

2. 0& C (F) = 0£, C ,(F) iff a = a' and b = b'. 

5. 0£ C (G) ant/ 0JJ, c , (G) are key-equal iffb= b' and c = c 7 . 
4. 0£ C (G) = 0£, c / (G) iff a = a' ant/ 6 = 6' ant/ c = d. 


Proof of Sublemma |4| 


Q] => Consequence of key(F) G +:9 (because G -w F). Q] <7 
[2l =7 Consequence of vars (F) ^ F®’ 9 (because F G).W~< 

[3 = 7 - Consequence of key(G) (Z) F®’ 9 (because F X G is a strong attack). [3] 


Consequence of key (F) C F + ’ 9 . 
Consequence of vars(F) C F®’ 9 . 

Consequence of 


key(G) C G+’ 9 . 

[4] = 7 > Consequence of itemjijand vars(G) (/ G +,q (because G F). [4] < 7 = Trivial. 


Let db be uncertain database with R \-facts and Si-facts. In what follows, we assume that db is typed, as 
explained in Section[3] It will be understood that a, a±, « 2 ,... belong to type(x'), that 6 , 61 , 62 ,... belong to 
type(y), and that c, ci, C 2 ,... belong to type( 2 ). 

Let /i(db) be the subset of db such that 

1. 6 ,(db)contains all S'l-facts of db; and 

2 . 6 (db) contains every i?i-block b of db such that for every fact // (a. 6 ) in b, there exists some constant c 
such that Si ( 6 , c, a ) is in db. 

Clearly, the computation of 6 ,(db) from db is in FO, and the following are equivalent: 

1 . every repair of db satisfies < 71 ; 

2 . every repair of /i(db) satisfies < 71 . 

We define /(db) as the following uncertain database: 

1. for every pair {/^(a, 6 ), Si ( 6 , c, a)} contained in L(db), /(db) contains 0£ c {q \ {G}); and 

2. for every Si( 6 , c, a ) in 6 ,(db), /(db) contains 0£ c (q \ {F}). 

It is easy to see that / is computable in FO. 

Let (/(db) be the subset of /(db) containing all facts of /(db) that are neither f?-facts nor S-facts. 
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By Sublemmas [3] and [4] 


(4) 


rset(/(dbj) = {/(r) U <?(db) | r £ rset(db)}. 


Let db be an arbitrary database with If -facts and .S'] -facts. It suffices to show that the following are equivalent 
for every repair r of db: 


1. r satisfies 51 ; 


2. /(r) U g(db) satisfies q. 


Ri(a , b) £ r and some constant c such that 9(F) £ c (q \ {G}). Likewise, we can assume Si(b', cf a') £ r 
such that 9(G) £ 0Jy c ,(q \ {F}). It suffices to show that a = a 1 and b = b'. 


• Fq = F and F n = G; and 


b = b' Since F ■&> G, there exists a sequence F 0 , Fi,..., F n of distinct atoms of q such that 


This is the easier part. 


Let 9 be a substitution over vars(g) such that 9(q) C /(r) U <?(db). By our construction, we can assume 


• for alii £ {0,..., n — 1}, we can assume Ui £ vars (Fi) fl vars(Fi + i) such that Ui ^ F + ' q . 

We show by induction on increasing i that for all i £ {0,..., n — 1}, there exist constants a, and c, such that for 
all Wi £ vars(Fi), we have 9(wf) £ {_L, a i; b , (a i} b), (b , c*), (a^, b , Cj}}. 

Basis i = 0. Since 9(F) £ 0£ c (<? \ {G}), for all wq £ vars (F 0 ), we have 9(wo) £ {±, a, b, (a,b), ( b,c ), 
(a,b,c)}. 

Step i —> i + 1. By the induction hypothesis, there exist constants a, and c, such that for all w, £ vars (Fi), we 
have 9(wi) £ {±, a t , b , (a, ; ,6), (6,Cj), (a^b,^)}. 

From m ^ F +,q , it follows that 9(m) £ {6, (a^, b ), ( b , Cj), (a^, b, a)}. 


Since Ui £ vars(Fi + i), it follows that there exist constants a^+i and Cj+i such that for all Wi+i £ 
vars(F i+ i), we have 9(w i+1 ) £ {1, a i+1 , b, (a i+1 ,b), ( b, c i+1 ), ( a i+1 ,b, c i+ i)}. 


It follows that for u ra _i 
(b) c n — i), (ci n — i, b : c n —\ 

Consequently, b = b'. 


£ vars(G), there exist constants a n _i and c n _i 
)}. From 9(G) £ 0^ c , (q \ {F}), it follows 9(u n - 


such that 9(u n _i) £ { b , (a n ^i,b), 
i)£{V, (a',b'),(b',J), (a',b',F)}. 


a = a' 


Analogous. 


□ 


7 Polynomial Tractability 

In this section, we prove the P upper complexity bound stated in the second item of Theorem[2] 

Theorem 5 Let q be a self-join-free Boolean conjunctive query. If the attack graph of q contains no strong cycle, 
then CERTAINTY(g) is in P. 


Road map The proof of Theorem[5]is technically involved. We start by introducing in Sectio n|7.1| an extension 
of the data model that allows some syntactic simplifications, expressed in Section[F2] In Section [7.3[ we introduce 
the notion of Markov cycle , and show how the “dissolution” of Markov cycles is helpful in the proof of Theorem[5] 
which is given in Section [F4| The dissolution of Markov cycles is explained in detail in Section[73] 

7.1 Relations Known to Be Consistent 

We conservatively extend our data model. We first distinguish between two kinds of relation names: those that 
can be inconsistent, and those that cannot. 
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Relations known to be consistent Every relation name has a unique and fixed mode , which is an element in 
{i, c}. It will come in handy to think of i and c as inconsistent and consistent respectively. We often write R c to 
denote that R is a relation name with mode c. If g is a self-join-free Boolean conjunctive query, then [g] denotes 
the subset of q containing each atom whose relation name has mode c. The inconsistency count of g, denoted 
incnt(g), is the number of relation names with mode i in q. Modes carry over to atoms and facts: the mode of an 
atom R(x, y) or a fact R{a , b) is the mode of R. 

The intended semantics is that if a relation name R has mode c, then the set of f?-facts of an uncertain database 
will always be consistent. 


Certain query answering with consistent and inconsistent relations The problem CERTAINTY (q) now takes 
as input an uncertain database db such that for every relation name R in g, if R has mode c, then the set of / '-facts 
of db is consistent. The problem is to determine whether every repair of db satisfies q. 

All results shown in previous sections carry over to the new setting, by assuming that all relation names used so far 
had mode i. Furthermore, as stated by Proposition [T] (which has an easy proof), relation names with mode c can 
be simulated by means exclusively of relation names with mode i. Therefore, having relation names with mode c 
will be convenient, but is not fundamental. 

Proposition 1 Let q be a self-join free Boolean conjunctive query. Let R c {x , y) be an atom with mode c in q. Let 
R\ and f ?2 be two relation names, both with mode i and with the same signature as R, such that neither R\ nor 
i ?2 occurs in q. Let q' = (q\ {R c (x, y)}) U y) , R-i(x, v)}- Then CERTAINTY(g) and CERTAINTY(g') 

are equivalent under first-order reductions. 

If relation names with mode c are allowed for syntactic convenience, the definition of F +,q needs slight change: 

F + ’ q := {x £ vars(g) | £((g \F) U [g]) |= key(F) -> x} 

Modulo this redefinition, the notion of attack graph remains unchanged. 

Proposition[T]explains how to replace atoms with mode c. Conversely, the following lemma states that in pursuing 
a proof for Theorem [5] there are cases where a self-join-free Boolean conjunctive query can be extended with 
atoms of mode c. 

Lemma 10 Let q be a self-join-free Boolean conjunctive query. Let x, z £ vars(g) such that AC (g) |= x —t z and 

9 9 

for every F £ q, iflC(q) |= x —> key (F), then F x and F z. Let q' = g U {T c (x, z)}, where T is afresh 
relation name with mode c. Then, 

1. there exists a polynomial-time many-one reduction from CERTAINTY(g) to CERTAINTY(g'); and 

2. if the attack graph ofq contains no strong cycle, then the attack graph ofq' contains no strong cycle either. 


Saturated queries Given a self-join-free Boolean conjunctive query, the reduction of Lemma 10 can be repeated 
until it can no longer be applied. The query so obtained will be called saturated. 

Definition 4 Let g be a self-join-free Boolean conjunctive query. We say that g is saturated if whenever x,z £ 
vars(g) such that /C(g) |= x — > z and /C([g|) x — > z, then there exists an atom F £ q with JC(q) f= x —> key(F) 
such that F x or F ■%+ z. < 

Example 6 Consider the query g = {R(x, y ), S\(y, z), Sfiy, z), T c (x, z, w), U(w, a:)}. We have /C(g) \= y —> z 
and /C([g]) \fr y —t z. The set {F £ q \ /C(g) |= y -£ key(F)} equals {Si, S 2 }. We have neither S± y nor 

there exists a 
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Si z. Likewise, neither S 2 y nor S 2 z. Hence, g is not saturated. By Lemma 
polynomial-time many-one reduction from CERTAINTY(g) to CERTAINTY(g') with g' = gU {S c (y, z)}, where 
S is a fresh relation name with mode c. It can be verified that the query g' is saturated. <1 
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7.2 Syntactic Simplifications 


The following lemma shows that any proof of Theorem [5] can assume some syntactic simplifications without loss 
of generality. 

Lemma 11 Let q be a self-join-free Boolean conjunctive query. There exists a polynomial-time many-one re¬ 
duction from CERTAINTY(g) to CERTAINTY(</)/or some self-join-free Boolean conjunctive query q' with the 
following properties: 

• incnt(g') < incnt(g); 

• no atom in q' contains two occurrences of the same variable; 

• constants occur in q' exclusively at the primary-key position of simple-key atoms; 

• every atom with mode i in q' is simple-key; 

• q' is saturated; and 

• if the the attack graph of q contains no strong cycle, then the attack graph of q' contains no strong cycle 
either. 

7.3 Dissolving Markov Cycles 

The following definition introduces Markov graphs. 

Definition 5 Let q be a self-join-free Boolean conjunctive query such that every atom with mode / in q is simple- 
key. For every x £ vars(g), we define 

C 9 (x) := {F £ q \ F has mode i and key(F) = {x}}. 

Notice that C q (x) can be empty. 

The Markov graph of q is a directed graph whose vertex set is vars(g). There is a directed edge from x to y, 
denoted x -^V y.if x y and KL{C q (x) U [g]) |= x —> y. If the query q is clear from the context, then x y 
can be shortened into x y. We write x y (or x y if q is clear from the context) if the Markov graph 
of q contains a directed path from x to yj^Notice that for every x £ vars (q), x x. 

An elementary directed cycle C in the Markov graph of q is said to be premier if there exists a variable x £ vars(g) 
such that 

1 . {x} = key(Fo) for some atom Fq with mode i that belongs to an initial strong component of the attack 
graph of g; and 

2. for some y in C, we have x y and fC(q) \= y —> x. 

The term Markov edge is used for an edge in the Markov graph; likewise for Markov path and Markov cycle. <] 

Example 7 Let q = {R(x,y,v), S(y,x ), V{'(v, w), W{w,v) Vf(w,y)}. All atoms in q are simple-key. Then, 
M = {Ci c (u,w), Vf(w,y)}. 

We have C 9 (x) = {R{x, v,y)}. Since JC{ C q (x) U [g]) |= x — > {y, v, w}, the Markov graph of q contains directed 
edges from x to each of y , v, and w. 

We have C q (v) = 0. Since KL(C q (v) U [g]) |= v —t {y, tu}, the Markov graph of q contains directed edges from 
v to both y and w. The complete Markov graph of q is shown in Fig.[3](right). 

The attack graph of q is shown in Fig. [3] (left). The atoms li(x, y, v ) and S(y, x) together constitute an initial 
strong component of the attack graph. It is then straightforward that each cycle in the Markov graph of q that 
contains x or y, must be premier. Further, the cycle v, w. v in the Markov graph of q is also premier, because there 
is a Markov path from x to v, and KL(q) |= v —> x. <\ 

-The term Markov refers to the intuition that in a Markov path, each variable functionally determines the next variable in the path, 
independently of preceding variables. 
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Figure 3: Attack graph (left) and Markov graph (right) of the query {R(x, u, v), S(y,x), V{ c (v,w), W(w,v) 
V£(w,y)}. 


Let q be like in Definition [5] and assume that the Markov graph of q contains an elementary directed cycle C. 
Lemma 12 states that CERTAINTY(g) can be reduced in polynomial time to CERTAINTY}*?*), where q* is ob¬ 
tained from q by “dissolving” the Markov cycle C as defined in Definition]?] Moreover, we will show (Lemma[T3]> 
that if C is premier and the attack graph of q contains no strong cycle, then the attack graph of q* will contain no 
strong cycle either. The reduction that “dissolves” Markov cycles will be the central idea in our polynomial-time 
algorithm for CERTAINTY}*?) when the attack graph of q contains no strong cycle. 


Definition 6 Let q be a self-join-free Boolean conjunctive query such that every atom with mode i in q is simple- 
key. Let C be an elementary directed cycle of length fc > 2 in the Markov graph of q. Then, dissolve(C, q) denotes 
the self-join-free Boolean conjunctive query defined next. Let cco, - - -, Xk - i be the variables in C, and let <?o = 
UtoQ(^). Let y be a sequence of variables containing exactly once each variable of vars(*?o)\{xo, • • •, Xk- i}. 
Let qi = {T(u, Xq, ..., Xk-i, y)} U {C/?(Xj, u)}i=o > where u is a fresh variable, T is a fresh relation name with 
mode i, and U\,, Uk-\ are fresh relation names with mode c. Then, we define 


dissolve(C, q) := {q\qo)Uqi- 

Notice that dissolvefC, q) is unique up to a renaming of the variable u and the relation names in q \. < 

Example 8 Let q be the query of Fig. [3] Let C be the cycle x , w. y, x in the Markov graph of q. Using the notation 
of Definition]?] we have 


q 0 = {R(x,y,v),S(y,x),W(w,v)} 

qi = {T(u,x,w,y,v),UZ(x,u),UZ(w,u),U!i(y,u)} 


Hence, dissolve(C, q) = {Vf(v, w), U 2 c (u;, y), T(u, x, w, y, v ), (x, u), U%(w, u), U£(y, it)}. 


<1 


Lemma 12 Let q be a self-join-free Boolean conjunctive query such that every atom with mode i in q is simple- 
key. Let C be an elementary directed cycle in the Markov graph of q, and let q* = dissolve(C, q). Then, there 
exists a polynomial-time many-one reduction from CERTAII\ITY(< 7 ) to CERTAINTY}*?*). 


The reduction of Lemma 12 will be explained in Section [73] To use the reduction in a proof of Theorem]?] two 
more results are needed: 


First, we need to show that the “dissolution” of Markov cycles can be done while keeping the attack graph 
free of strong cycles (this is Lemma 13 i. This turns out to be true only for Markov cycles that are premier 
(as defined in Definition]?]!. 

Second, we need to show the existence of premier Markov cycles that can be “dissolved” (this is Lemma[T4|). 


Lemma 13 Let q be a self-join-free Boolean conjunctive query such that every atom with mode i in q is simple-key. 
Let C be an elementary directed cycle in the Markov graph ofq such that C is premier, and let q* = dissolve(C, q). 
If the attack graph ofq contains no strong cycle, then the attack graph ofq* contains no strong cycle either. 


15 










Lemma 14 Let q be a self-join-free Boolean conjunctive query such that 

• for every atom F £ q, if F has mode i, then F is simple-key and key (F) 7 ^ 0; 


• q is saturated; 

• the attack graph ofq contains no strong cycle; and 

• the attack graph ofq contains an initial strong component with tw’o or more atoms. 

Then, the Markov graph ofq contains an elementary directed cycle that is premier and such that for every y in C, 

C q (y) f 0. 

The condition C q (y) 7 ^ 0 , for every y in C, guarantees that dissolve(C, q) will contain strictly less atoms of mode 
i than q. This condition will be used in the proof of Theorem [5] which runs by induction on the number of atoms 
with mode i. The following example shows that Lemma[l4]is no longer true if q is not saturated. 

Example 9 Continuing Example[ 6 ] The query q of Example [ 6 ]is not saturated, but satisfies all other conditions in 


the statement of Lemma 
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In particular, the attack graph of q contains a weak cycle R U /?, which is part 


of an initial strong component. The Markov graph of q consists of a single path w ^7 x y z, and hence 


is acyclic. 

The query q' of Example [h] is saturated, and we have x 
premier. 


w x, a Markov cycle which can be shown to be 

< 


7.4 The Proof of Theorem ID 


Proof of Theorem [5] Assume that the attack graph of q contains no strong cycle. The proof runs by induction on 
increasing incnt(< 7 ). The desired result is obvious if incnt(g) = 0. Assume that incnt(g) > 0 in the remainder of 
the proof. Let db be an uncertain database that is input to CERTAINTY)//). 


First, we reduce in polynomial time CERTAINTY)//) to CERTAINTY)//) with q' like in Lemma 11 
distinguish two cases. 


We now 


Case q' contains an atom F with mode i that has zero indegree in the attack graph of q. We can assume 
either F = R(x, y) or F = Rja , y), where y is a sequence of distinct variables. In the remainder, we treat the 
case F = R(x, y) (the case F = R(a, y) is even simpler). 

Let q" = q' \ {R(x,y)}. By Lemma [8] every repair of db satisfies q' if and only if db includes an f?-block 
b (there are only polynomially many such blocks) such for every R(a, b) £ b, every repair of db satisfies 
q"[ x gj- By Lemma|5j the attack graph of 9"^ gj contains no strong cycle. From incnt(g ,, j a . - l _ >a gj) = 
incnt(g') — 1 < incnt(g), it follows that CERTAINTY^"^ - ( _ >a g.) is in P by the induction hypothesis. It follows 
that CERTAINTY(g) is in P as well. 


Case every atom F with mode i in q' has an incoming attack in the attack graph of q'. It will be the case 
that no constant occurs in an atom of mode i in q'. 


Then, the attack graph of q' must contain an initial strong component with two or more atoms. By Lemma 14 


the Markov graph of q' contains an elementary directed cycle C that is premier and such that for every y in C, 

we can reduce in polynomial time CERTAINTY^') to CERTAINTY(</*) where 


By Lemma 


12 


Ml/) 7^ 0- _ 

q* = dissolve(C, q'). Since the attack graph of q' contains no strong cycle, it follows by Lemma 13 that the attack 
graph of q* contains no strong cycle either. 


Let k > 2 be the size of C. It can be easily verified that incnt(g*) < (incnt/i/) — k) +1 < 100 ( 11 ( 9 '). Bythe induc¬ 
tion hypothesis, CERTAINTY(g*) is in P. Since there exists a polynomial-time reduction from CERTAINTY(g) 
to CERTAINTY(g*), we conclude that CERTAINTY^) is in P as well. □ 
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7.5 The Reduction of Lemma [12] 

This section first describes the reduction of Lemma[T2] and then proves the lemma. 


Relevance of subsets of repairs In Section[3] we distinguished database facts that are relevant for a query from 
those that are not. This notion is extended next. 


Definition 7 Let q be a self-join-free Boolean conjunctive query, and let db be an uncertain database. A consistent 
subset s of db is said to be grelevant for q in db (generalized relevant) if it can be extended into a repair r of db 
such that some fact of s is relevant for q in r. <1 

It can be seen that A € db is relevant for q in db if and only if {A} is grelevant for q in db. Therefore, “grelevant” 
is a notion that generalises “relevant.” 


Lemma 15 Let q be a self-join-free Boolean conjunctive query, and let db be an uncertain database. Let s be 
a consistent subset of db that is not grelevant for q in db. Let dbo = lJ{block(A, db) | A £ s}. Then, the 
following are equivalent: 


1. every repair of db satisfies q; 

2. every repair of Ah \ dbo satisfies q. 


Proof Q]=>[2] By contraposition. Let r be a repair of db \ dbo that falsifies q. Then, r U s is a repair of db. If 


r U s |= q, then it must be the case that s is grelevant for q in db, a contradiction. We conclude by contradiction 
that r U s |^= q. [2]=>[U Trivial. □ 


Introductory example The following example illustrates the main ideas behind the reduction of Lemma[T2| 

Example 10 Let q be a self-join-free Boolean conjunctive query. Assume that q includes qo = { Il(x, y), S(y, z), 
V(z, a;)}. Then, the Markov graph of q contains a cycle x y z x. Let db be an uncertain database 
that is purified relative to q. Let dbo be the subset of db containing all R- facts, 5-facts, and l 7 -facts of db. 
Assume that the following three tables represent all facts of db 0 (for convenience, we use variables as attribute 
names, and we blur the distinction between a relation name R and a table representing a set of f?-facts). 


R 


X 

y s 

y 

z V 

z 

X 

1 

a 

a 

a 

a 

i l 



a 

K 

K 

i / 

2 

b 

b 

p 

p 

2 \ 

2 

c 

c 

7 

7 

2 J 

3 

d 

d 


<5 

3 1 

3 

e 

e 

e 

e 

3 1 

4 

e 

e 

6 

s 


4 

/ 

f 

P 

p 

4 J 


db 0 i 

dt> 0 2 


db 03 


As indicated, we can partition dbo into three subsets dboi, dbo 2 , and dbo.; whose active domains have, pairwise, 
no constants in common. Consider each of these three subsets in turn. 


1. db 0 i has two repairs, each of which satisfies q 0 . For every repair r of db, either r |= q 0 ^ x y Zl _ ¥l a (1: j or 

n | d0[x,y,zt— 

2. dbo 2 has two repairs, each of which satisfies qo . For every repair r of db, either r |= q 0 ^ x y zh ^ 2 & / 3 ] or 

t I dO[x,y,Zh->2,c,y]' 

3. db 03 has 16 repairs, and for s := {i?(3, d), S(d,S), V(5,4), R( 4, e), 5(e, e), V(e, 3), 5(/, </>), V(p, 4)}, 
we have that s is a repair of db 0 3 that falsifies qo. It can be seen that s is not grelevant for q in db. Then, by 
Lemma [l~5] every repair of db satisfies q if and only if every repair of db \ dbo 3 satisfies q. That is, dbo 3 
can henceforth be ignored. 
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The following table T summarizes our findings. In the first column (named with a fresh variable u), the values 01 
and 02 refer to dboi and dbo 2 respectively. The table includes two blocks (separated by a dashed line for clarity). 
The first block indicates that for every repair r of db, either r |= <Zo[ x ,j/, 2 >-n,a,a] or r l = Qo[ x ,y,z^i,a,K.]- Likewise 
for the second block. 


u 

X 

y 

z 

01 

1 

a 

a 

01 

1 

a 


02 

2 

b 

p 

02 

2 

c 

7 


The table U x shown below is the projection of T on attributes x and u. This table must be consistent, because by 
construction, the active domains of dboi and dbo 2 are disjoint. Likewise for U y and LI z . 


£4 

X 

u 


1 

01 


2 

02 


Uy 

y 

u 


a 

01 


b 

02 


c 

02 


z 

u 

a 

01 

k, 

01 

p 

02 

7 

02 


Let db' be the database that extends db with all the facts shown in the tables T, U x , U y , and £40 Let q* = 
(q \ qo) U {T(u, x, y, z ), U£(x, u ), U y (y, u), U z (z, u)}. From our construction, it follows that every repair of db 
satisfies q if and only if every repair of db' satisfies q*. < 


Gblocks and gpurification The following definition strengthens the notion of purification introduced earlier in 
Section [3] 

Definition 8 Let q be a self-join-free Boolean conjunctive query such that all atoms with mode i in q are simple- 
key. Let db be an uncertain database that is purified and typed relative to q. A gblock (generalized block) of db 
relative to q is a maximal (with respect to C) subset g of db such that all facts in g have mode i and agree on their 
primary-key position (but may disagree on their relation name). Notice that a gblock has at most polynomially 
many repairs (in the size of db)0 We say that db is gpurified relative to q if for every gblock g of db, every 
repair of g is grelevant for q in db. <1 

Clearly, every gblock is the union of one or more blocks. Two facts of the same gblock have the same primary-key 
value, but can have distinct relation names. 

Example 11 Let q = {R(x 7 y), S(x, y)}. Let db = {i£(a, 1), f?(a, 2), 5(a, 1), S(a, 2)}. Then, db is purified 
and typed relative to q. All facts of db together constitute a gblock. The uncertain database db is not gpurified, 
since s = 1), S(a. 2)} is a repair of the gblock, and also a repair of db. However, neither R(a. 1) nor 

S(a, 2)} is relevant for q in s. < 

Example 12 Let q = {£?i(a:, y), f? 2 fe z ), S(y, 2 )}, where the signature of S is [2, 2], Let db be the uncertain 
database containing the following facts. 


X 

y 

R-2 

X 

z 

s 

y 

Z 

a 

1 


a 

3 


1 

3 

a 

2 


a 

4 


2 

4 


Then, db is purified and typed relative to q. All R\ -facts and 1? 2 -facts together constitute a gblock. A repair of 
this gblock is s = {f?i(a, 1), /4(a, 4)}. The uncertain database db is not gpurified. Indeed, the only repair of db 
that extends s is {f?i(a, 1), f?2(a,4), 5'(1,3), 5'(2,4)} (call it r). Neither f?i(a, 1) nor f?2(a,4) is relevant for q 
in r. < 

3 Facts of dbo can be omitted from db', but that is not important. 

4 Indeed, since db is purified relative to q. every gblock of db contains at most y distinct relation names, and hence has at most db| q 
distinct repairs. 
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The following lemma is similar to Lemma[T]and has an easy proof. 

Lemma 16 Let q be a self-join-free Boolean conjunctive query such that all atoms with mode i in q are simple-key. 
Let db be an uncertain database that is purified and typed relative to q. It is possible to compute in polynomial 
time an uncertain database db ; that is gpurified relative to q such that every repair o/db satisfies q if and only if 
every repair of Ah' satisfies q. 


Specification of the reduction of Lemma [12] Let q and C be as in the statement of Lemma 12 Assume that 
the elementary directed cycle C in the Markov graph of q is x 0 X\ ■ ■ ■ Xk-i x 3 . In what follows, 

let dissolve(C, q) be as in Definition |6j with q 3 , q\, y, u, T, and Uq, ..., Uk-i as defined there. Moreover, we 
write © for addition modulo k, and © for subtraction modulo k. For every i £ {0,..., k — 1}, we define X, as 
follows: 

Xi ■— vars(C g (a; i )). 


The reduction of Lemma 12 will be described under the following simplifying assumptions which can be made 
without loss of generality: 


• every uncertain database db that is input to CERTAINTY!//) is typed, purified, and gpurified relative to q. 
This assumption is without loss of generality as argued in Section[3] and by Lemmas [T| and [1~6| and 


for every i £ {0,..., k — 1}, no atom of C q (xf) contains constants or double occurrences of the same 


variable. This assumption is without loss of generality by Lemma 11 


Under these notations and assumptions, we describe the reduction of Lemma[T2] Let db be an uncertain database 
that is input to CERTAINTY (q). Define a directed /.'-partite graph, denoted (/(db), as follows: 


1 . the vertex set of (/(db) is |J* =C ^ type(a/); and 

2 . there is a directed edge from a £ type(xi) to b £ type(xj®i) if for some valuation 9 over vars(g), we have 
that 8(q) C db and 9{xi) = a and $(x,® 1 ) = b. In this case, we say that (/[2T/] realizes the edge (a, b), 
where 9[X,] denotes the restriction of 9 on Xj. 


Notice that distinct valuations can realize the same edge of (7(db) (but if db is consistent, then every edge in 
(7(db) is realized at most once). 

Example 13 Let q = {Ri(xo,yi), R 2 (xo,y 2 ), S c { y 1 ,y 2 ,x 1 ), R 3 (xp,y 3 ), V(x^,x 0 )}. Then, x 0 27 and 
Xq = {xq, 2/1,2/2,2/3}- Assume an uncertain database db containing, among others, the following facts. 


Ri 

Xq 

2/1 


a 

Cl 


r 2 

x 0 

2/2 

s 

2 /i 

2/2 

Xi 

R.i 

x 0 

2/3 


a 

C 2 


Cl 

C2 

1 


a 

p 


a 

C 3 


Cl 

C3 

1 


a 

7 


The graph (/(db) contains a directed edge (a, 1 ), which is realized by {x 0 1 —> a, y\ 1—»• C\, j/ 2 1—»• C 2 , 2/3 H► /?}. 
The edge (a, 1 ) is also realized by {xo >->■ a, y\ >->■ ci, 2/2 ^ C 3 , y 3 i-» 7 }. <1 

Let [db] be the subset of db that contains all facts with mode c. Significantly, the edges in (/(db) outgoing from 
some constant a £ typ e(xj) (for some j £ { 0 ,..., k — 1 }) are fully determined by [db] and the gblock of db 
containing all facts whose relation name is in C q (xj) and whose primary-key position contains the constant a (call 
this gblock g a ). Since db is gpurified, for every repair s of g a , there exists a unique constant b £ type(x ? ®i) such 
that 

s U [db] |= (C q (xj) U 

in which case (/(db) will contain a directed edge from a to b. Uniqueness of b follows from IC(C q (xj) U [g]) |= 
Xj —>• 27 ®i and [16] Lemma 4.3]. 

Since db is gpurified, (/(db) is a vertex-disjoint union of strong components such that no edge leads from one 
strong component to another strong component (i.e., all strong components are initial)]^] In what follows, let I) be 
a strong component of (7(db). Since (/(db) is /c-partite, the length of any cycle in (/(db) must be a multiple of 
k, i.e., must be in {At, 2k, 3k,.. Let db// be the subset of db that contains R{a, b) whenever R is of mode i 

5 Strong components are defined by Definition 1 
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and the constant a is a vertex in D (and b is any sequence of constants). Obviously, every block of db is either 
included in db d or disjoint with db/,. 

Clearly, D must contain a cycle. Among the cycles in 1) of length exactly k, we now distinguish the cycles that 
support q from those that do not, as defined next. Let such cycle in D be 


ao, at,..., afc_i, ao 


(5) 


where for i £ {0,..., k — 1}, a* £ type(xi). For i £ {0,..., k — 1}, let A/ be the set of all valuations over A,; 
that realize (a,;, 0 ^ 91 ). We say that the cycle ([5]) supports q if for for all i,j £ {0,..., k — 1}, for all pi £ A,; 
and pj £ A,, it is the case that pi and pj agree on all variables in X,, n Xj. Notice that Xi n Xj can be empty. 
The cycle (|5j» may not support q, because pi and pj can disagree on variables in A/ D Xj n vars (y), as illustrated 
next. 


Example 14 Let q = {R(x o, Xi ,y), S(x i,Xq, y)}. We have Xq X\ Xq. Let db be the uncertain database 
containing the following facts. 


R 

x 0 

X\ 

y 

s 

Xi 

Xq 

y 


a 

1 

a 


1 

a 

a 


a 

1 

p 


1 

a 

p 


The edge set of C?(db) is {(a, 1), (1, a)}. Both (a, 1) and (1, a) are realized by the valuations {xo K > a, X\ K > 1, 
y i— >■ a} and {xo H > a, X\ H > 1, y H>■ /?}, which disagree on y. Hence, the cycle a, 1, a does not support q. <] 


On the other hand, we can assume without loss of generality that pi and p 3 agree on all variables in X, n Xj (T 
{xo,..., Xfe_i}. In particular, if Xj £ Xj, then Pj(xi) = pi(xi) = a,i. To see why this is the case, assume that 
Xi £ Xj, where i,j £ {0,..., k — 1} and i ^ j. Then, it must be that Xj x j. Two cases can occur: 

• if J = *0 1, then pj realizes the edge ( 0 * 91 , aP) and Pj(xp = ap, and 


• if j ^ i Q 1, then Xj Xi Xi®i ■ ■ • Xj e i X j is a shorter Markov cycle. 

The second case can be avoided by picking C to be the shorter cycle, as illustrated by Example p~5| It can be seen 
that such choice of C is without loss of generality. In particular, in Lemma 14 if C was premier, then the shorter 
cycle will also be premier. 


Example 15 Let q = {R(x 0 ,xi), S(xi,X 2 ,x 0 ), V(x 2 ,x 0 )}. Then, x 0 -A+ Xi x 2 x 0 . We have 
A 0 = {cco,Xi}, Xi = {xi,X 2 ,Xq}, and A 2 = {x 2 ,Xq}. Assume an uncertain database db with the following 
facts. 


R 

x 0 

X\ 

S 

Xi 

Xi 

Xo 

V 

Xi 

Xo 


a 

1 


1 

p 

a 


P 

a 


b 

1 


1 

p 

b 


P 

b 


The graph (?(db) contains an elementary directed cycle a, 1, /3, a. The edge (a, 1) is realized by p 0 = {a® H> a, 
X\ i — ^ 1}. The edge (1, /3) is realized, among others, by p\ = {x\ i—>■ 1, x 2 *-> f3, Xq h->■ 6 }. Notice that po and pi 
disagree on xq. Although it is easy to deal with this situation where two valuations disagree on a variable in the 
Markov cycle, it is even easier to avoid this situation by working with the shorter Markov cycle xq Xi Xq. 

< 


We now distinguish two cases. 


Case D contains either an elementary directed cycle of size k that does not support q, or an elementary 
directed cycle of size strictly greater than k. We show in the next paragraph how to construct a repair s of 
db/) such that s is not grelevant for q in db. Then, by Lemma 15 every repair of db satisfies q if and only if 
every repair of db \ db/, satisfies q. In this case, the reduction deletes from db all facts of db/ ,. 


The construction of s proceeds as follows. Pick an elementary cycle in D that has size strictly greater than k, or 
that has size k but does not support q. The cycle picked will henceforth be denoted by £. Construct a maximal 
sequence 


(V 0 , E 0 ),b u (Vx,£7i),6 2 , (V 2 , E 2 ), ...,b n , (V n , E n ) 


where 
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1. Vo is the set of vertices in E, and Eq is the set of directed edges in £; and 

2. for every i £ {1,..., n}, 

(a) bi qL V )_1 and for some c £ Vi-i, (6,;, c) is a directed edge in fy(db); and 

(b) Vi = Vi -1 U {bi} and i U {(6 i( c)}. 

The resulting graph (V n , E n ) is such that V n is equal to the vertex set of D, and E n contains exactly one outgoing 
edge for each vertex in V n . The graph ( V n . E n ) contains no directed cycle other than E. To construct s, for each 
j £ {0,..., k — 1}, for each vertex a £ V n 0 type(xj), select some valuation y, that realizes the edge in E n 
outgoing from a, and add y(C q (xj)) to s. If E has size k, then the valuations /i should be selected such that for 
some vertices a, b in £, the valuations chosen for a and b disagree on some variable of vars(y). It is not hard to 
see that the set s so obtained is a repair of db/) that is not grelevant for q in db. 

We illustrate the above construction by two examples. 

Example 16 In Example [l4| one can choose s = { R(a , 1, a), S'(l. a, /?)}. The treatment of a directed cycle of 
size strictly greater than k is illustrated by dbo 3 in Example [T0| <1 

Example 17 Let q = {R(xq, yi, y 2 ), V(xi,y 2 ), S{( y 1 ,y 2 ,x 1 ), S%(y 2 , x 0 )}. We have x 0 Xi x 0 , 
Xq = {xq, 2 /i, y 2 }, and X\ = {xi,y 2 }. Let db be an uncertain database with the following facts. 


Xo 

yi 

2/2 

V 

Xi 

2/2 


2/i 

2/2 

Xi 

S C 2 

2/2 

x 0 

a 

1 

2 


7 

2 


1 

2 

7 


2 

a 

a 

3 

4 


7 

4 


3 

4 

7 


4 

a 

a 

1 

6 


P 

6 


1 

6 

P 


6 

a 


The following table lists the edges in fy(db), by type, along with the valuations that realize each edge. 
Edges in type(x 0 ) x type(xi) Edges in type(xi) x type(x 0 ) 


Edge 

Realized by 

Edge 

Realized by 

( a > 7) 

{x 0 2}=/xi 

(7, a) 

{xi H> 7 , 2/2 2} 

=At4 


{x 0 >->■ a, yi i->- 3, y 2 ^ 4}=/r 2 


{xi 7 ,y 2 !->■ 4} 

=R5 

(a, P) 

{x 0 Ha.i/iH 1, y 2 1 / 6}=/z 3 

(P,a) 

{xi 6} 

=R6 


Then, Q (db) contains two elementary cycles, a, 7, a and a, j3, a, both of length 2. The cycle a, 3. a supports q. 
The cycle a, 7, a does not support q, because yi and /X 5 disagree on y 2 . Therefore, the edges (a, 7) and (7 , a), 
along with ji-\ and n$, will be used in the construction of a consistent set s that is not grelevant for q in db. For the 
remaining vertex (3, we add the edge (3, a), which is only realized by // fi . Then, s contains the /f-fact R(a. 1, 2) 
(because of /ii), and the F-facts V("/, 4) and V(/3, 6 ) (because of /is and /ig respectively). In this example, there 
is only one repair that contains s, and this repair falsifies q. <1 


Case every elementary directed cycle in D has length k and supports q. In this case, we will encode each cy¬ 
cle of D as a set of T-facts, as follows. Consider any cycle of the form <0 in I), and take the cross product 

A 0 x A 2 x ••• x A fc _i, (6) 

which is of polynomial size (in the size of db). Since we are in the case where any cycle of the form 0 supports 
q, for every tuple (/io, fii ,..., Hk- 1 ) in the cross product jhjl, the set y := (J^Ty ^ is a well defined valuation over 
{xo,..., Xk- 1 } U vars (y). In this case, for each such tuple, the reduction adds the following k + 1 facts: 

T(D,a 0 ,...,a k -i,n(y)) 

Uq(oo, D) 


Uk-lj o-k-l , D ) 

in which D is used as a constant. Recall that a* = /r(xj) for i £ {0,..., k — 1}. Notice that if the sequence y 
is empty, then the reduction will add exactly one T-fact for every cycle of the form (|5j. Otherwise, the reduction 
may add multiple T-facts for the same cycle, as illustrated next. 
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Example 18 Let q = {R(xo,Xi,y), S(xi,a;o)}. We have x 0 xi -^4 %o, X 0 = {xq,x\ ,y} and Xi = 
{xq, x i }. Let db be the uncertain database containing the following facts. 


R 

xo 

Xi 

y 


a 

1 

a 


a 

1 



S 

XI X 0 


1 a 


The edge set of C/(db) is {(a, 1), (1, a)}. The edge (a, 1) is realized by both {xq K > a, Xi 1, y K>• a} and 
{xo 4 a, ii i-t 1, y 4 p}. The edge (l,a) is realized only by {xo 4 a, ii 4 1}. The cycle a, 1, a in £(db) 
supports q. The reduction will add the following '/’-facts (for some identifier D ): 


T 

u 

x 0 

Xi 

y 


D 

a 

1 

a 


D 

a 

1 

p 


Example 19 Take the query q of Example [TT] with the following uncertain database db. 


<1 


R 

Xo 

yi 

2/2 

V 

Xl 

2/2 


2 /i 

2/2 

Xl 


a 

a 

1 

1 

2 


1 

1 

2 

6 

7 

P 


6 


7 

P 

2 

6 



a 

3 

6 



3 

6 

P 


S°2 

2/2 

Xo 


2 

a 


6 

a 


Then, (/(db) contains two elementary cycles, a, 7 , a and a, {3, a, both of length 2 and both supporting q. The 
reduction will add the following T-facts (for some identifier D): 


u 

Xo 

Xl 

2 /i 

2/2 

D 

a 

7 

1 

2 

D 

a 

P 

1 

6 

D 

a 

P 

3 

6 


Each relation Uf encodes that each constant in type (a;,) n adom(db) occurs in a unique strong component of 
G(db). The meaning of the T-facts is as follows. Let V = {xq, ..., U vars(y). Let Q]j be the set of all 

valuations over V such that 

T(D, n(x k -i),fJ,(y)) 

has been added by the reduction. Then the following hold (recall qo = (JJ'Lq 1 C q (xi)): 

• for every repair r of db, there exists p £ ©d such that r |= p(qo)', and 

• for every // (E B/a, there exists a repair r of db such that 

L r 1= n(q 0 )-, and 

2 . for each \j! £ 0 d, if fi' ^ y ,, then r y'(qo)- 

The cycles in D can be found in polynomial time by solving reachability problems, as explained in fl7l Theorem 4] 
and E). The crux is that the number of cycles in Q (db) of length exactly k is polynomially bounded. Any longer 
cycle consists of an elementary path cio, ai,..., a k -i, a' 0 of length k (a 0 7 ^ a' 0 ), concatenated with an elementary 
path from a' 0 to ao that contains no vertex in {ai,..., a k - 1 }- Notice incidentally that the reduction needs to know 
the existence (or not) of cycles of size strictly greater than k in any strong component L), but the vertices on such 
cycle need not be remembered. 

It can now be seen that, in general, the above reduction results in a database db' that is as in the following 
lemma. 

Lemma 17 Let q and C be as in the statement of Lemma \12\ Let q* = dissolve^, C), and let the variable u be as 
in Definition^ Let db be an uncertain database that is input to CERTAINTY(g). We can compute in polynomial 
time an uncertain database db 7 that is a legal input to CERTAINTY^*) such that the following hold: 
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1. for every repair r of Ah, there exists a repair r' of Ah’ such that for every valuation 9 over vars(q*), if 
9{q*) C r', then 9{q) C r; and 

2. for every repair r ’ of Ah' , there exists a repair r of db such that for every valuation 9 over vars(g), if 
9{q) C r, then there exists a constant D such that 9[ u ^D]{q*) C r'. 

We can now prove Lemma [l2| 

Proof of Lemma |l2| Let db be an uncertain database that is input to CERTAINTY(q). By Lemma 17 


we can 


compute in polynomial time an uncertain database db 7 that is a legal input to CERTAINTY^*) such that db 7 sat¬ 
isfies conditions[l]and[2]in the statement of Lemma 17 It suffices to show that the following are equivalent. 

1. Every repair of db satisfies q. 

2. Every repair of db / satisfies q*. 

|T]=>|2] Proof by contraposition. Assume a repair r' of db' such that r' \/L <f. By item 2 in the statement of 


Lemma 117] we can assume a repair r of db such that for every valuation 9 over vars(g), if o{q) C r, then there 
exists a constant D such that d] (q*) C r'. Obviously, if r \= q, then r' |= q*, a contradiction. We conclude by 
contradiction that r \f= q. [2] => [T] Proof by contraposition. Assume a repair r of db such that r < 7 . By itemjl] 


in the statement of Lemma 


17 


we can assume a repair r' of db' such that for every valuation 9 over vars(g*), if 
9(q*) C r', then 9(q) C r. Obviously, r' q*. □ 


8 Conclusion 

This paper settles a long-standing open question in certain query answering, by establishing an effective com¬ 
plexity trichotomy in the set containing CERTAINTY(g) for each self-join-free Boolean conjunctive query q. In 
particular, we show that, given q, there exists a procedure that looks at the structure of the attack graph of q and 
decides whether CERTAINTY^) is in FO, in P \ FO, or coNP-complete. 

The exciting question that still remains open is whether the above trichotomy can be extended beyond self-join- 
free conjunctive queries, to conjunctive queries with self-joins and unions of conjunctive queries. 
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A Proofs for Section |H 

A. 1 Proof of Lemma |3| 

We use the following helping lemma. 

Lemma 18 Let q be a self-join-free Boolean conjunctive query. Let F,G £ q such that F G. Then, for every 

x £ F + ' q \ G + ' q , there exists a sequence Fq, F \,..., F n of atoms of q such that 

• F 0 = F; 

• for all i £ {0,..., n — 1}, vars (Ff) D vars(i r [+i) ^ G +,q ; and 

• x £ vars (F n ). 

Proof Consider a maximal sequence 

key(F) = So Hi 

Si H2 


Sfc—1 Hk 

S k 

where 

1. S 0 C Si C • • • C S fc _i C S fc ; and 

2 . for every i £ {1,2 ,..., k}, 

(a) Hi £ q \ {F}. Thus, K,{q \ {F}) contains the functional dependency key {Hf) —>• vars(Hj). 
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(b) key (Hi) C Si- 1 and Si = Si- 1 U vars (Hi). 

Then, Sk = F + ' q . From F -w G, it follows G ^ {i?i,..., TT^}. For every v £ Sk, define d(v) as the smallest 
integer i such that v £ Si. Let x £ F +,q \ G +,q . We define the desired result by induction on d(x). 


Basis: d(x) = 0. Then the desired sequence is F. 


Step: d(x) = i. Hence, x £ Si and x Si- 1 . Then, x qL key(Hi) C Si- 1 and x £ vars(TTi). Since 
Hi G, we have key (Hi) (J- G +,q , or else x £ G + ' q , a contradiction. Therefore, we can assume some variable 
y £ key(Ffj) \ G +,q . Since y £ Si - 1; we have d(y) < d(x). By the induction hypothesis, there exists a sequence 
Fq, F[..... F n of atoms of q such that 

• F 0 = F; 

• for al lie {0,..., n — 1}, vars (Fi) (T vars(f 7 i + i) ^ G +,q \ and 

• V G F n . 

The desired sequence is F), F -\..... F n . //,. □ 


The proof of Lemma[3]is given next. 

Proof of Lemma [ 3 ] Assume F -w G, G H, and F //. 

Since F G, there exists a sequence Fq, F \,..., F n of atoms of q such that 


• Fq = F and F n = G; and 

• for alH £ {0,..., n — 1}, vars(Fi) n vars(Fi + i) ^ F +,q . 

Since G H, there exists a sequence Go, Gi,..., G m of atoms of q such that 

• Go = G and G m = H ; and 

• for alii £ {0,..., m — 1}, vars(Gi) IT vars(Gi+i) ^ G +,q . 

Consider the path 


Fq, Fi,..., F n , Gi, G 2 , ..., G m 


q 

where Fq = F, F n = G = Go, and G m = H. Since F yT H. we can assume j £ {0,..., m — 1} such that 
vars(G J )nvars(G :)+ i) C F +,q . Since vars(G_,-) IT vars(G J+ i) $£ G +,q , we can assume x £ vars(Gj)nvars(G J+ i) 
such that x £ F + ' q \ G +,q . 


By Lemma 18 there exists a sequence // (l , H \ 


.., Hk of atoms of q such that 


• H 0 = F- 


• for alH £ {0,..., k — 1}, vars(fT.j) IT vars(iFi + i) ^ G +,q \ and 


• x £ Hk. 


Consider the sequence 

G 0 , Gi,..., Gj,H k , H k -\,... ,H 0 , 

where Go = G and H 0 = F. Every two consecutive atoms in this sequence share a variable not in G~' q . In 
particular, G, and Hf. share the variable x. It follows G F. □ 
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A.2 Proof of Lemma 0] 


Proof of Lemma|4] The first item is an immediate consequence of Lemma[3] In what follows, we show the second 
item. 

We show that if the attack graph of q contains a strong cycle of length n with n > 3, then it contains a strong cycle 
of some length m with m < n. 

Let / /, i 'W II\ 'W H 2 & ■■■ & H n -! Hq be a strong cycle of length n (n > 3) in the attack graph of q, 
where i ^ j implies //, ^ Hj. Assume without loss of generality that the attack Hq Hi is strong. Thus, 
K.(q) key(Tfo) -A key(iTi). 

We write i ® j as shorthand for for (i + j) mod n. If H\ H\^ 2 , then Hq H\ Hi^ 2 £*■■■£* H n _i -w 

H 0 is a strong cycle of length n — 1, and the desired result holds. Assume next II\ Hi^ 2 . By Lemma [ 3 ] 
H 2 H\. We distinguish two cases. 

Case H 2 II] is a strong attack. Then 1I\ II 2 11 \ i s a strong cycle of length 2 < n. 


Case H -2 -A H i is a weak attack. 


IfTTi & Hq. then Hq Hi Hq is a strong cycle of length 2 < n. 


H 2 . The cycle 

9 


Assume next Hi Hq. Then, from Hq -A //, -A // 2 and Lemma 3] it follows // 0 

#0 -—^ H‘2 ■w Tf 2 (j) i ^ • • • "W Htl _ 1 ^ _£/q has length 71 — 1. It suffices to show that the attack Hq ^ H -2 is 

strong. Assume towards a contradiction that the attack Hq -w H 2 is weak. Then, K,(q) |= key(TTo) —> key(H 2 ). 
Since H 2 -w iTi is a weak attack, IC(q) \= key(H 2 ) —> key(fTi). By transitivity, IC(q) |= key(Ho) —>• key(ffi), a 
contradiction. This concludes the proof. □ 


A.3 Proof of Lemma HI 

Proof of Lemmajs] Let q' = q[ x ^ a \. L°r every F £ q', there exists a (unique) atom F £ q such that F = F[ xl _>. 0 ]. 
It can be easily shown that for every F £ q' , we have F + ’ q \ {a;} C F + ’ q . 

q ' Z 1 Z 2 Z n q > 

Assume F G. Then, there exists a witness Fq ^ F\ ^ F 2 ... ^ F n for F G where Fq = F and F n = G. 

It can now be easily seen that Fq ^ F\ ^ F 2 ... ^ F n is a witness for F -A G. Therefore, if the attack graph of 
q' is cyclic, then the attack graph of q is cyclic. 

The second item in the statement of Lemma [5] follows from the observation that for all F,G £ q', if IC(q) |= 
key(F) —► key(G), then K.(q r ) \= key(F) —» key(G). □ 


B Proofs for Section [5| 

B.l Proof of Lemma [6] 

Proof of Lemma[6] We show a first-order reduction from the problem UFA (Undirected Forest Accessibility) JT] 
to CERTAINTY (qo). In UFA, we are given an acyclic undirected graph, and nodes u. v. The problem is to 
determine whether there is a path between u and v. The problem is L-complete, and remains L-complete when 
the given graph has exactly two connected components. Moreover, we can assume in the reduction that the two 
connected components each contain at least one edge. 

Given an acyclic undirected graph G = (V. E ) with exactly two connected components, and two nodes u, v, we 
construct an uncertain database db as follows: 
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1. for every edge {a, b} in E, the uncertain database db contains the facts Ro(a, {a,b}), {a, 6}), 

So({a , 6}, a), and So({a, b},b), in which {a, b} is treated as a constant; and 

2. db contains Rq(u, t) and Rq(v, t), where t is a new value not occurring elsewhere. 

Clearly, the computation of db from G is in FO. 

We next show that there exists a path between u and v in O if and only if every repair of db satisfies qq. 

Assume first that u, v belong to the same connected component. Let db 7 be the uncertain database that is con¬ 
structed from the connected component not containing u, v. Let ao, bo, at, bi ,..., o n _i, 6 n _i, a n be a sequence 
of distinct constants such that 

1. ao = a n and for 0 < i < j < n — 1, a* b aj and bi b bj~, and 

2. for i £ {0,..., n — 1}, db' contains Ro((H, bf) and So(bi, aj+i). 

Since G is acyclic, any such sequence satisfies n = 1. An existing algorithm for CERTAINTY(go) [ 17. 111 will 
return that every repair of db' satisfies q 0 . Consequently, every repair of db satisfies q 0 . 

For the opposite implication, assume that one connected component contains u, and the other contains v. By 
Lemma [Tl there exists an uncertain database db 7 that is purified relative to qo such that qo is true in every repair 
of dl/ if and only if qo is true in every repair of db. It is easy to see that if u and v belong to distinct connected 
components, then this purified uncertain database db' will be the empty database, whose only repair is the empty 
repair which falsifies q 0 . It follows that q 0 is not true in every repair of db. □ 


B.2 Proof of Lemma [8] 


We first show two helping lemmas. 

Lemma 19 Let q be a self-join-free Boolean conjunctive query. Let X C vars(g) and let G £ q be an R-atom 

such for every x £ X, G -f> x. Let r be a repair of some database such that r \= q. Let A £ r be an R-fact that 
is relevant for q in r. Let B be key-equal to A and rg = (r \ {A}) U {B}. Then, for every valuation (j over X, if 
v b b C(<?)> then r \= ( (q). 

Proof Let ( be a valuation over X such that r« |= <j(q)- We can assume a valuation £ + over vars(g) such that 
(b[X] = £[X] and ( + (q) C r b- Thus, C + extends (j to vars(g). We need to show r |= <j(q), which is obvious if 
B C+b)- Assume nexti? £ (, + (q). Since A is relevant for q in r, we can assume a valuation p, over vars(g) such 
that A £ p(q) C r. Let q' = q \ {G}. Let r' = rg \ {B} = r \ {A}. Since q' contains no f?-atom (no self-join), 
C + (b) Q v ' ar, d bW) ^ r, ■ Moreover, C + [key(G)] = //[key(G)], because A and B are key-equal. 

From K{q') \= key(G) —> G +,q and fT6l Lemma 4.3], it follows C + [G + ’ 9 ] = p,[G +A }. 

Let r be the complete edge-labeled undirected graph whose vertices are the atoms of q\ an edge between H and 
H' is labeled by vars (H) n vars(fL'). 

Let t' be the graph obtained from r by cutting every edge whose label is included in G +,q . Let qc be the subset 
of q containing all atoms that are in t 1 ’ s strong component that contains G. Let q\ = q\qc- 

Let k be the valuation over vars(g) such that for every x £ vars(q), 

if * € vars(g G ) 

\ C + (z) if x £ vars(gx) 


We show that k is well defined. Assume x £ vars (qx) FI vars (qg)- Then, there exist atoms F' £ qx and 
G' £ qc such that x £ vars(F') n vars(G'). Since F' and G' belong to distinct strong components of r', it 
follows vars(F') n vars(G') C G +,q . Consequently, x £ G +,q . Since £ + [G + ’ 9 ] = p,[G + ’ q ], it follows that 

Mb) = C + 0)- 


Obviously, n(q) C r. Finally, we show that for every u £ X, n(u) = <j(u). This is obvious if u £ X D G +,q . 
Assume next that u £ X \ G + q . Since G uby the assumption in the statement of Lemma 


19 


it must be the 
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case u £ vars (qx), hence k(u) = ( + (u) = C( u )- It follows r \= £(g). This concludes the proof. 


□ 


The following helping lemma extends lfl6l Lemma B.l]. 


Lemma 20 Let q be a self-join-free Boolean conjunctive query. Let F £ q such that F has zero indegree in the 
attack graph of q. Let r be a repair of some database. Let A £ r such that A is relevant for q in rrl Let B be 
key-equal to A and Vb = (r \ {A}) U{f?}. Then, for every valuation £ over key (F), if Yb |= ( ]{q ), then r |= (,(q). 


Proof The proof is obvious if A has the same relation name as F. Assume next that relation names in A and F 

q 

are distinct. We can assume some atom G £ g \ { F\ such that A has the same relation name as G. Since G F, 

q 

we have that for each x £ key (/<’), G -fr x. The desired result then follows by Lemma 
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Assume that a query q contains an f?-atom that has no incoming attack in the attack graph of q. Paraphrasing 
Lemma[20] if one replaces, in a repair r, some relevant fact A with another fact B that belongs to the same block 
as A, then every /t’-fact of r that was not relevant in r, will remain non-relevant in (r \ {A}) U {/i}. Notice, 
however, that the fact B may be non-relevant in the new repair (r \ {A}) U {B}. 

The proof of Lemma[8]can now be given. 

Proof of Lemmaj8] Let X = key (/•’). Let db be an uncertain database. Let r be a repair of db that is -frugal. 
Let s be any repair of db. Construct a maximal sequence 

(i’o,s 0 ),(r 1 ,s 1 ),...,(r„,s„) (7) 


where 


1 . r 0 = r and s 0 = s; 

2 . for every i £ {1 ,..., n}, one of the following holds: 

(a) r, = r*_i and s,; = (s,_i \ {A}) U {B} for distinct, key-equal facts A,B such that A £ Sj_i, 
B £ ri_i, and A is relevant for q in Sj_i; or 

(b) Si = Sj_i and r, = (rj_i \ {A}) U {B} for distinct, key-equal facts A, B such that A £ rj_ 1; 
B £ Sj_i, and A is relevant for q in r;_i. 

That is, the construction repeatedly replaces a fact that is relevant in one repair with its distinct, key-equal fact in 
the other repair. The sequence ([7| is finite, since the total number of distinct relevant facts distinguishes at each 
step. For the last element (r„, s n ), it holds that the set of facts that are relevant for q in r„ is equal the set of facts 
that are relevant for q in s„. It follows that for every valuation 9 over X, 


fn |= 0{q) 


Sn H %)■ 


By Lemma 20 for every valuation 


r» N %) => r|= 6(q ) 

Sn b 8(q) => s b o(q) 


From (|9j» and since r is -frugal, it follows that for every valuation 9 over X, 

r„ |= %) r (= 9(q) 


( 8 ) 

(9) 

( 10 ) 


( 11 ) 


From (fTT), ([TO), and ([8]), it follows that for every valuation 9 over X, 


r b o(q) => s b 


Since s is an arbitrary repair, the desired result follows. 


□ 


Recall from Section|3]that A € r is relevant for q in r if A S 9(q) C r for some valuation 9 over vars (q). 
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C Proofs for Section |7| 

This section contains helping lemmas and proofs that are used in the proof of Theorem|5] 


C.l Helping Lemmas 

Lemma 21 Let q be a self-join-free Boolean conjunctive query. Let G £ q and x,y £ vars (q) such that 
lC{q \ {G}) |= x —> y and y f G +,q . Then, there exists a sequence Gi,..., G n of distinct atoms in q such 
that x £ vars(Gi), y £ vars(G ra ), and for every i £ {1,..., n — 1}, vars(Gi) IT vars(Gi_|_i ) g G+>“. 

Proof If x = y, then the desired sequence that proves the lemma is any atom that contains x. In the remainder, 
we treat the case x y. 

Since IC(q\ {G}) |= x —> y, we can assume a shortest sequence F\, F%,..., F m (call it 7r) that is a sequential 
proof of )C(q \ {G}) |= x —> y, as defined by Definition[3] Note that G f {Fi,..., F m }. It will be the case that 
y occurs at a non-primary-key position in F m . 

The proof runs by induction on the length to of the proof. 


Basis If to = 1, then the sequential proof 7r is F\ with key(Fi) = {x}. Notice that key(Fi) ^ 0, or else 
y £ G +,q , a contradiction. The desired sequence that proves the lemma is F-\. 


Induction Assume m > 1. Consider the last atom F m in ir. We have key (' F m ) f G +,q , or else y £ G +,q , a con¬ 
tradiction. If x £ vars(F m ), then the desired sequence is F m . In the remainder, we treat the case x f vars(F m ). 
We can assume a variable u £ key (F m ) such that u f G +,q . There exists an integer k < m such that u occurs 
at a non-primary-key position in F^. Then, F\. F %,..., Fp : contains a shortest subsequence that is a sequen¬ 
tial proof of K.{(j \ {G}) |= x —t u, where u f G +,q . By the induction hypothesis, there exists a sequence 
Gi, ... ,Gi of distinct atoms in q such that x £ vars(Gi), u £ vars (Ge), and for every i £ {1, 1}, 

vars(Gj) IT vars(Gi+i) ^ G +,q . The desired sequence that proves the lemma is G i,... ,Ge,F m . Notice that 
u £ vars(G^) IT vars(F m ) and u G +,q . □ 


The following two lemmas are important tools for inferring attacks. 

Lemma 22 Let q be a self-join-free Boolean conjunctive query. Let G £ q and y £ vars (q) such that G - 3 + y. Let 
x £ vars (q) such that K.(q \ {G}) \= x —> y. Then, G x. 


Proof From G y, it follows y (j G +,q . A witness for G x can be obtained by concatenating the sequence 

□ 


Gi,..., G n like in the statement of Lemma 21 where y £ vars(G n ), with a witness of G y. 


Lemma 23 Let q be a self-join-free Boolean conjunctive query. Let G £ q and y £ vars(q) such that G y and 
K(q) key(G) —> y. IflC(q) \= x —► y, then G x. 

Proof The desired result is obvious in case x = y. In the remainder of the proof, we treat the case x f y. 
Assume K.(q) |= x -£ y. Then, we can assume a shortest sequence F\. F->,... ,F n that is a sequential proof of 
K{q) |= x —> y as defined by Definition[3] 

Let V = (U"= i vars(Fy)^ U {x}. For every u £ V \ {x}, we define the depth of u, denoted d(u), as the smallest 
integer j such that u £ vars (Fj). Furthermore, we define d(x) = 0. Clearly, d(y) = n. 

We show next that if G attacks some variable u £ V with d(u) > 0 and fC(q) key(G) —► u, then also G attacks 
some variable v! £ V with d(u') < d(u) and K,(q) \L key(G) —► v!. 

Assume G u with d(u) = k > 0 and IC(q) key(G) — t u. It must be the case that u £ vars (Fk) \ key (Fk). 
Also, IC(q) key(G) —>• key (Fk) (otherwise, KL(q) \= key(G) —» u, a contradiction). Then, there must be some 
w £ key (Fk) such that lC(q) key (G) — t w, which implies w fj G +,q . Clearly, d(w) < k and G w. 
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It follows G x. 


□ 


C.2 Proof of Lemma [TOl 


Proof of Lemma 10 ItemQ] Let ir = // . //^ ,..., //,, be a shortest sequence that is a sequential proof of 

9 9 

KL(q) |= x —► z. Clearly, for i G {1,..., n}, we have K.(q) \= x —> key (Hi), hence Hi -/> x and Hi /> z, by the 
assumption in the statement of LemmaflO] 

Let db be an uncertain database that is the input to CERTAINTY(g). 


{ X z\ . . 

Sublemma 5 Let a , b be constants. If some ’ -frugal repair o/db satisfies qt x ,zi-+a,b]: then for every repair 
Yg o/db, for every valuation 9 over vars (q) such that 9(q) C r^, if 9{x) = a, then 9(z) = b. 


Proof Let r a be a /^ ' -frugal repair of db. Let 0a be a valuation over vars (q) such that 0 a (<"/) C r ,i, and 
9a(x ) = a and 0a (z) = b. That is, r a 1= q[ x , z ^a,b]- Let r b be a repair of db such that for some valuation 9b 
over vars(g), we have 9s(q) C yb and 9b(x) = a. We need to show 9b{z) = b. 


We show how to inductively construct a maximal sequence 


(PO; t*o, Co); (Pi; f 1: Cl); • • • ; (Pm; Cm) 


where for every j > 0, 

1 . Yj is a f.q X,z ^-frugal repair of db; 

2 . Cj is a valuation over vars(g) such that C j(q) C rp, 

3. Cj(x) = a and Q(z) = b , i.e., Yj \= q [XiZh + 0 , 6] ; 

4. pj G {0,1,... ,n} and for all i G {1,... ,Pj}, C,j{Hf) = 0 B (Hi)', 

5. p 0 < pi <■■■ < pj. 

Intuitively, one can think of pj as an index in n indicating that Q and 9 n agree on all variables in //, //a,..., H p .. 

For the basis of the induction, we choose (po, ro; Co) = (0, r a, 9 a)- In this way, the above conditions are obviously 
satisfied for j = 0. 


For the induction step j -7 j + 1, letp J+1 be be the smallest integer fc such that C^/Lf/c) 7 ^ 0 B {Hk)- It can be seen 


that C j(Hk) and 9 B (Hk) must be key-equal. Let r J+1 = (r. ; \ {C j{Hf)}) U {0 B (Hk)}. By Lemma 19 


{x,z} 


and since 


Yj is jPq ' J -frugal, it follows r ?+1 |= q\x, z >-+a,b\- So there exists a valuation p over varsfp) such that p{q) C r J+1 , 
and p(x) = a and p(z) = b. From Yj \ {Cj(Hk)} = r^+i \ {0 B {Hk)} and p(x) = Cj(x), it will be that case that 
p{Hf) = Cj(Hi) for all i G {1,... ,Pj}. By the condition]^] p(Hi) = 9 B (Hi) for all i G {1,... ,Pj}. Then by 
our choice of Pj+\ and our construction of r J+ i, we have p[Hi) = 9 B {Hi) for all i. G {1, ... ,Pj+ 1 }- We choose 
Cj+i = p. With these choices, the above conditions [TJJ5] are satisfied for j + 1. 


For j = m, we will have that Cm and 9b agree on all variables in |J" =1 vars(F/). Since Cm( 2 ) = b, it follows 
9 b (z) = b. This concludes the proof of Sub lemma [5] H 


Sublemma 6 Let a, b\.b 2 be constants such that bi fz b 2 . If db |= q^^i-Aafix] an d db |= q[x,z>-^a,bo}> then for 
every -frugal repair 17 o/db, Y f \f= q[ xl _> a y 

Proof Assume the existence of two valuations 9\, 0 2 over vars(g) such that 0\{q) C db, 9 2 {q) C db, 9 i(x) = 
02(x) = a, and bi = 9i(z) /= 9 2 (z) = b 2 . Then, there exist two repairs ri,r 2 such that 9\(q) C ri and 
02 (q) C r 2 . 

Assume towards a contradiction the existence of a A^'^-frugal repair Yf of db such that yj |= q\ x ^ a \ . Then, 
we can assume a valuation p over vars(q) such that p(q) Crj and p(x) = a. By Sublemma|5| 0\ (z) = p(z) and 
0 2 ( 2 ) = p{z). hence 0\ (z) = 0 2 ( 2 ), a contradiction. This concludes the proof of SublemmapT H 
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Construct a maximal sequence 


( 12 ) 


dbo, a\ , dbi, 02, db 2 ,..., at , db^ 


where dbo = db and for i £ {1 ,..., £}, 


1. there exist two constants bi,Ci such that bi ^ Cj, db,_i |= Q[ x .z^a,.,b,]^ an d dbj_i \= q[ x , zh + ai ,ci]’ and 

2 . dbj = db;_i \ db,_i, where db/_i is the smallest subset of db,_i that includes every block b of db/_i 
such that Oj occurs in some fact of b. Recall from Section[3]that we assume uncertain databases to be typed. 

Then, the following are equivalent: 


1 . every repair of db satisfies q; 

{x z\ 

2 . every < q 1 -frugal repair of db satisfies q; and 

r x z\ 

3. every < q ’ -frugal repair of db/ satisfies q. 

Equivalence of items [I] and [2] follows from Lemma [2] Equivalence of items [2] and [3] follows from Sublemma[6] 
using induction on increasing i £ {0 ,..., £}. 


Since the sequence (12 1 is maximal, it must be that db/> llb g 


Let db 7 be the database that 


and such that for every valuation 9 , if 6(q) C db^, then db 7 contains T c {Q(x),0(z)). Clearly, the 
of db' is consistent, and the following are equivalent: 


includes db^ 
set of T-facts 


\x z\ 

1 . every < q ’ -frugal repair of db/ satisfies q\ 

2. every C'' 2 ' r - frugal re p a i r 0 f db' satisfies q U {T c (x, z )}; and 

3. every repair of db' satisfies q U {T c (x,z)}. 

Finally, it can be easily seen that db' can be computed from db in polynomial time. This concludes the proof of 
the first item. 


Item|3 Define q' = q U {T c (x, z)}. We show that for all F, G £ q, if F G, then F G. For every attack 


F ^ G, we distinguish two cases depending on F. 


Case K,{q \ {F} ) j= x —> z. Then clearly, F +,q = F +q . The only hard case is where a witness for the attack 
F G contains the atom T c (x,z). Then, z ({ F +,q , hence z (f F + q . From Lemma [ 21 ] it follows that there 
exists a witness for F G. 


Case fC(q\ {T 1 }) x —> z. Since KL(q) \= x —> z, it must be the case that every sequential proof of IC(q) |= 
x —> z contains F. Then KL{q) |= x —► key(J 7 '). By the assumption in the statement of Lemma 
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F x and 

F z. Assume towards a contradiction that a witness of F G contains T c {x , z). Then, since F +,q C F + ' q , 
it must be the case that F ^ x or F -w z, a contradiction. We conclude by contradiction that no witness of 
fXg contains T c {x , z). Since F + ’ q C F+*', it follows F & G. 


Assume that the attack graph of q' contains a strong cycle C. Since the atom T c (x, z) cannot be in C (since it has 
no outgoing attacks), the attack graph of q contains the same cycle C. It can be easily seen that C is strong in the 
attack graph of q. □ 


C.3 Proof of Lemma HT1 

We first show two helping lemmas. 

Lemma 24 Let q be a self-join-free Boolean conjunctive query. Let F be an atom of q. Let G be an atom with a 
fresh relation name such that key(G) = key(F) and vars(G) = vars(F). Let q' = (q\ {F}) U {G}. Then, 
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1. there exists a polynomial-time many-one reduction from CERTAI NTY(q) to CERTAINTY((/); ancQ 

2. if the attack graph of q contains no strong cycle, then the attack graph of q' contains no strong cycle either. 
Proof The proof of the second item is straightforward. 

For the first item, let db be an uncertain database that is input to CERTAINTY (q). By Lemma[I] we can compute 
in polynomial time a database db p such that db p is purified relative to q and such that every repair of db satisfies 
q if and only if every repair of db p satisfies q. 

Let db 7 be the uncertain database that includes db p and such that whenever db p contains 9(F) for some valuation 
9 over vars (F), then db' contains 9(G). Notice here that vars(F’) = vars(G) and, since db p is purified, whenever 
A £ db p has the same relation name as F, then there exists a valuation 9 over vars(f’) such that A = 9(F). It 
can now be easily verified that every repair of db p satisfies q if and only if every repair of db 7 satisfies q'. □ 


Notice that the roles of F and G can be switched in the statement of Lemma 24 showing that CERTAI NTY(g) 
and CERTAI NTY(q') are polynomially equivalent. 


Example 20 If F = R(a,x,x,y,y,z,z,b,u) and G = S(x,y, z,u), then key(F) = key(G) and vars (F) = 
vars(G). So Lemma |24| implies that we can replace F with G in the study of CERTAINTY(g). <] 

Lemma 25 Let q be a self-join-free Boolean conjunctive query. Let R(x, y) be an atom of q with mode i. Let 
q 0 = {Ri(x, w) R 2 (w, x), S(w , y)}, where Ri, R 2 are fresh relation names of mode c, S is afresh relation name 
of mode i, and w is a variable such that w ^ vars(g). Let q' = (q \ {R(x, y )}) U qo- Then, 


1. there exists a polynomial-time many-one reduction from CERTAI NTY(q) to CERTAINTY(g'); and 

2. if the attack graph ofq contains no strong cycle, then the attack graph of q' contains no strong cycle either. 


Proof ItemQ] Assume that the signature of R is [n,k\. Let db be an uncertain database that is input to 
CERTAINTY^). Define an injective function h that maps every element in (adom(db)) fc to a fresh constant not 
occurring elsewhere. Let db 7 be the database obtained from db by replacing each fact Rid. b) with the following 
three facts: 

h(a)),R 2 (h(a),a), and S(h(a), b). 


Since the function h is injective, the set of I f -facts and /G-facts of db 7 is consistent. Hence, db 7 is a legal input 
to CERTAINTY(g'). Intuitively, Ii\ -facts encode the function h, and /G-facts affirm that h is injective. It remains 
to be shown that every repair of db satisfies q if and only if every repair of db' satisfies q'. 

Define / : rset(db) —>• rset(db 7 ) such that for every r £ rset(db), 

• if r contains R(a, b), then /(r) contains S(h(a), 6); 

• /(r) contains all f?i-facts and all f? 2 -facts of db 7 ; and 

• if T is a relation name in q such that 7’ f If then /(r) contains exactly the same T-facts as r. 


The following can be easily verified for every r £ rset(db): 

• /(r) is indeed a repair of db ? ; and 

• q is true in r if and only if q' is true in /(r). 


The desired result follows from the easy observation that / is bijective. 


Item [2] By a little abuse of notation, we will denote atoms by their relation name. First, observe that /C([</]) |= 
w —> vars(T) and /C([q']) \= vars(T) —> w. This implies that for any atom F £ q \ {i?}, we have F +,q = 
F + ' q ' \ {iu}. Furthermore, i? + ’ 9 = S +,q ' \ {w}. 


Notice that atoms R\ and If have mode c, and hence have no outgoing attacks in the attack graph of q'. We will 
now show that for all F, G £ q \ {i?}. 


• if S&G, then R G; 

7 We know that there exists such a first-order reduction. However, polynomial-time is sufficient here and allows for an easier proof. 
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( 13 ) 


• if F S, then F R; and 

• if F G, then F G. 

To this extent, assume an attack F G where F,G £ (q\ {f?}) U {S'}. We can assume a witness 

Z\ Z2 Z n 

S 0 - F-, ^ F-2 F n 


for F G where Fq = F and F n = G. We can assume without loss of generality that 1 < i < j < n implies 
Zi ^ Zj, and that 0 < i < j < n implies F t ^ Fj. Moreover, since vars(-Ri) = vars(i?2), we can assume that f?2 
does not occur in ( fTTj i. We distinguish two cases. 

Case So = S. Since {u>} U vars(x) C S +,q , we have that Si and f ?2 do not occur in the sequence (13 i, and that 


Then, R - F l - S 2 . 


F n is a witness for R F n . 


Case F n = S. It may be the case that w £ { z\ , ..., z„ }. Then, by the form of qo, we can assume a smallest 
integer i such that z t £ vars(^) U vars(y). Then, So ^ Si ^ S 2 ... ^ R is a witness for So R. 

Case F 0 S F n . The only hard case is when the sequence (| 1 3[> is of one of the following forms: 


x w y 

S 0 ... ^ Sf ^ S ^ ... F n , 

y w x 

S 0 ...-S-S}-...S n , 


or 


where x £ vars(x) and y £ vars (y). Then, y ^ F 0 +,q and x ^ F 0 +,g . It follows y ^ S 0 + ’ 9 and x ^ S 0 + ’ 9 , 

W W 

which implies that we can replace the subsequence R\ ^ S (or S ^ R\) with S to obtain a witness for 

S 0 A F n . 

It follows that every cycle in the attack graph of q' is present in the attack graph of q modulo a replacement of S 
with S. 

Assume that the attack graph of q contains no strong cycle. Let C' be an elementary directed cycle in the attack 
graph of q'. Let C be the directed cycle in the attack graph of q obtained from C by replacing S with S. The 
attack cycle C must be weak. Then, the attack cycle C will be weak, because for every F,G £ q \ {S}, 

• if fC(q) 1= key(S) —> key(G), then IC(q') \= key(S) —► key(G); 

• if JC(q) \= key(S) —> key(S), then K(q') \= key(S) —► key(S'); and 

• if fC(q) 1= key(S) —► key(G), then JC(q') |= key(S) —> key(G). 

This concludes the proof. □ 


The proof of Lemma [IT] is now straightforward. 

Proof of Lemma |TT] Apply the reductions of Lemmas 24 and 25 


Then repeatedly apply the reduction of 
Lemma 10 until it can no longer be applied. Notice that the reduction of Lemma [T0| consists in adding atoms 
of the form T c (x , z). □ 


C.4 Proof of Lemma [l3l 

Proof of Lemma[l 3 | Assume that k,x 0, • • •, Xk-i, y, qo, Qi are as in Definition[6] Let K = T{u,x 0, ■ ■ •, Xk-i ,y)- 

Since the Markov cycle C is premier, we can assume an atom F f) £ q with mode i and x £ vars(</) such that 
key(T'o) = {x} and x x Q and K{q) \= x 0 x. 

Assume that the attack graph of q contains no strong cycle. 

Sublemma 7 K.(qo U [q]) U {u —> xq, xq —> u} |= K(q{). 
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Proof /C(gi) is logically equivalent to {u —> z \ z £ vars(go)} U {xi — >u\0<i<k — 1}. 

Let z £ vars(go). Clearly, for all i,j £ {0,..., k — 1}, /C(go U [g]) |= Xi — > Xj. It is then obvious that 
K.(qo U [g]) 1= xq —> z. Hence, /C(go U [g]) U {u —> xq, Xq —> u} |= u —> z. 

Let i £ {0,..., k — 1}. As argued before, /C(g 0 U [g]) |= Xi —> Xq. Hence, /C(g 0 U |g]) U {u —> xq, Xo —> u} \= 
Xi —> u. 

It follows that every functional dependency of fC(qi) is logically implied by IC(qo U [g]) U {u —> Xq, xq —► u}. H 


Sublemma 8 JC(qi) \= /C(g 0 ) U {u —> xq, x 0 —>• u}. 

Proof Obviously, IC(qi) \= u —> xo, /C(gi) \= Xq —> u, and for every i £ {0,..., k— 1}, K{qi) |= 'X, —> vars(go). 
Every atom of go is of the form R(xi , z) where i £ {0,..., k — 1} and vars(z) C vars(go). Since KL(q\) |= Xi —> 
vars(go), we have /C(gi) \= Xi —> vars(2). H 


Sublemmas [7] and [^immediately lead to the following results. 

Sublemma 9 K,(q*) = lC{q) U {u —> Xo, Xo —> u}. 

Sublemma 10 For every F £ q \ qo such that F has mode i, we have /C(g* \ {F}) = /C(g \ {F}) U {xo —>• u, 

U —> Xq}. 


Sublemma 11 For every F £ q \ go such that F has mode i, we have F +q = F +) q \M- 

Proof Let F £ g \ go such that the mode of F is i. From Sublemma 10 it follows that F + ’ q C F + - q . Since 
u ^ vars(g), it follows F + ' q C F +,q * \ {w}. 
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and the observation that in the computation of 


The inclusion F + ' q * \ {«} C F + ’ q follows from Sublemma 
F + ' q , the functional dependencies Xg u and u ^ x o are useless, except for inferring u £ F + ’ q from 
x 0 £F+’ q \ 


All (/,-atoms have mode c and hence have no outgoing attacks in the attack graph of q*. The following lemma 
states that all attacks among atoms of q\qo in the attack graph of q* are also present in the attack graph of q. 

Sublemma 12 For all F,G £ q \ go, if F ^ G, then F -3* G. 


Proof Let F,G £ g \ go such that F -L G. Then, we can assume a witness for F G of the following 
form: 

H 0 - H { - H n , (14) 

where Ho = F and H n = G. We can assume without loss of generality that 1 < i < j < n implies z, f 
Zj, and that 0 < i < j < n implies Hi f Hj. Since H\F ' q C // (l +: ' / by Sublemma [TT[ it follows that 


{ Zl ,...,z n }nH 0 + ’ q = 


If the sequence (14 1 contains no atom of gi, then it is also a witness for F G, and the desired result holds. 
In the remainder, assume that the sequence ( fl~4| > contains an atom of q\. Because of the structure of gi, we can 
assume without loss of generality that K is the only atom of qi that occurs in the sequence (14 1 . So we can assume 
l £ {1,..., n — 1} such that Hf = K. Clearly, Zf . 2 ^ +1 £ vars(g 0 ) and by Sublemma 11 zg, Z£+ 1 ^ F + ' q . 


For the variable zg+ 1 , there exists some i £ {0,..., k — 1} such that either ze,+\ = x, t or the atom R{xi , zt+i) 
belongs to q 0 . Since /C(go U [g]) |= Xi —> Xj for all i,j £ {0,..., k — 1}, it follows /C(g \ {F}) \= Xi —> zg. 


From F zp_, it follows F Xi by Lemma 22 and hence F zp + -\. It can then be easily seen that there exists 
a witness for F G. H 


We finally focus on attacks in the attack graph of q* that involve the atom K. 

Sublemma 13 For every H £ q*, if H Z* K, then H £ q \ q 0 , and both /C(g) |= key(Fo) —► key (IT) and 
/C(g) |= key (H) ->• key(F 0 ). 
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Proof Let H £ q* such that H K . Since (/,-atoms have no outgoing attacks in the attack graph of q*, it must 
be the case that H £ q\qo- The Markov graph of q contains a directed path from x to xq (recall {x} = key(Fo)); 
let M be the set of variables on this path. We now distinguish two cases. 

• If key(fT) C M, then clearly K{q) \= key(F 0 ) —7 key(H). Since K(q) |= key(TT) — 7 Xq and JC(q) \= 
Xq —7 key(F 0 ), we obtain /C(g) \= key (if) —7 key(Fo). 


z for every 2 £ vars(go). Since H K , it must be that H ■%> z for 


Otherwise, JC(q \ {H}) \= key(F 0 ) 
some z £ vars(go). Then, H x by Lemma 22 and consequently H -w Fq. Then, it must be the case that 
H belongs to the initial strong component of the attack graph of q that also contains Fq. Since the attack 
graph of q contains no strong cycle, we have /C(g) \= key (Fq) — 7 key(fT) and K(q) |= key(fT) — 7 key(F 0 ). 


This concludes the proof of Sublemma[T3] 


We can now complete the proof of Lemma[T3| Assume towards a contradiction that the attack graph of q* contains 
a strong cycle. By Lemma |4] the attack graph of q* contains a strong cycle of size 2. So we can assume atoms 

//(). H\ £ q* such that II 0 //-, -G // 0 , and at least one of the attacks is strong. 


Case H 0 ,Hi £ q\ g 0 . By Sublemma 
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H 0 ■3* Hi Ho- Since the attack graph of q contains no strong attack 
cycles, we have K,{q) \= key (H 0 ) key (Hi) and K-(q) |= key(fT 0 ) ~^ key(Tf!). From Sublemma [9] 

it follows K.(q*) |= key(fTo) — > key(iTi) and IC(q*) \= key(Ho) -£ key(TTi), contradicting that Hq 
H i -w Hq is a strong attack cycle. 


Case H 0 = K (the case II\ = K is symmetrical). Then, key (H 0 ) = { u} . By Sublemma [13] ll\ £ q \ q 0 , and 
both JC(q) |= key(F 0 ) — > key(fTi) and K,(q) |= key(fTi) — > key(F 0 ). From Sublemma [9] and K,{q) \= 
x 0 — > key(F 1 0 ). it follows IC(q*) |= u —> key (Hi). From Sublemma [9] and K(q) \= key(F 0 ) —> x 0 (because 

there is a Markov path from x to cco), it follows IC(q*) \= key(Hi) —> u. But then Ho Hi Z* Hq is a 
weak attack cycle, a contradiction. 


In both cases, we conclude by contradiction that the attack graph of q* contains no strong attack cycle. □ 


C.5 Proof of Lemma [14] 


We use the following helping lemma. 

Lemma 26 Let q be a self-join-free Boolean conjunctive query such that 

• for every atom F £ q, if F has mode i, then F is simple-key and key (F) 0; 

• q is saturated; and 


• the attack graph of q contains no strong cycle. 

Let Fq be an atom ofq that belongs to an initial strong component of the attack graph ofq, and let key(i 7 o) = {y}. 
Let x £ vars (q) such that JC(q) |= x —> y and K,(q) |= y —t x. Then, there exists z £ vars(y) with C q (z) 7 ^ 0 such 
that x z and IC(q) \= z y. 

Proof If x y, then the desired result holds for z = y. In the remainder of the proof, we treat the case 

x y- 


Let qo be a minimal (with respect to C) subset of q such that /C( C q (x) U [g] U go) 1= x — 1 ► y. Obviously, 
go IT C q (x) = 0 and go D [g] = 0. Let p be a minimal (with respect to C) subset of C q (x) U [g] U go such that 
the atoms of p can be sequentially ordered into a sequential proof (call it n) of /(’(g) |= x —> y. Clearly, n must 
contain all atoms of go. 


From x 7^ y, it follows IC(C q (x) U [g]) x — > y. Hence, go 7^ 0. Let G be the leftmost atom in it such that 
G £ qo- Notice that key(G) 7 ^ 0 by the premise in the statement of Lemma 26 We can assume z £ vars(g) such 
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z and C q (z) 7^ 0. It remains 


that G £ C q (z). Since G is chosen leftmost, /C(C g (x) U [5]) |= x —> z, hence x 
to be shown that IC(q) \= z y. 

Assume towards a contradiction that 1 C{q) z — > y. In the next paragraph, we show that tt contains an atom H 
such that for some wi,W 2 £ key (if), 

1 . IC(q) \= z — » w\ but /C([<z]) z — » w\, and 

2 . lC(q) Y= z — > w 2 - 


Existence of H, w\, and W 2 ■ Let V = vars (p) U { x } and let the sequential proof tt be H \. IF -... ■ He. For 
every u £ V \ { x }, we define the depth of u, denoted d(u), as the smallest integer j such that u £ vars (Hj). 
Furthermore, we define d(x) = 0. Clearly, d{y) = i. 

For u £ V and i,j £ {0, we write i >—> j if d{u) = i and j £ {i + 1,... ,£} such that u £ key(Hj). 

U 

Intuitively, if i > 0, then i >—> j says that the variable it is introduced in the sequential proof by 11,, and “used” 
later on by H y We can assume k £ {1, ...,£} such that G = H *.. Clearly, d(z) < k. It can be easily seen that 
the following can be assumed without loss of generality. 

Simple-Things-First Condition: for every u £ V, if /C([g]) \= z — i u, then d(u) < k. 

Since no atom of n is redundant, there exists a sequence 

U\ U2 Um 

k 0 >—> fci >—♦ /C 2 • • • > — y k m 


where ko = k and k rn = l. Thus, y occurs at a non-primary-key position in Hk m . For all i £ {1,... , m}, 
d(v,i) > k. hence fC([?]) z —> Ui by the Simple-Things-First Condition. 

Since K.{q) Y= z y, we have KL{q) z —> key(fTfc m ). Hence, we can assume a smallest integer j £ 
{1,2,..., to} such that IC(q) z — > key(iT^.). Then obviously, K.(q) \= z ► key(Hk j _ 1 ). hence K.(q) |= z — > 
Uj. We can choose w\ = Uj and H = Hk . Further, since K,(q) Y= z —> key(Hk ), we can choose W2 £ key {Hk ) 
such that K,{q) z — > W 2 - We conclude that H, w\, and W 2 indeed exist. 

Since q is saturated, from KL{q) \= z —>• w± and /C([g]) z —i uq, it follows that there exists an atom G' £ q such 
that K,{q) \= z — > key(G") and such that either G' z or G' -w w±. Clearly, G' is an atom with mode i. 

We show KL{q \ {G'}) |= x — > z. Assume towards a contradiction that fC(q\{G'}) Y= x — > z. Since 
K{C q {x) U [g]) |= x — > z, it must be the case that G' £ C q (x), hence key(G') = {a;}. Then, from K.{q) |= 
z —> key(G') and KL{q) \= x —> y, it follows K.{q) \= z —► y, a contradiction. We conclude by contradiction that 
£(q\ {G"D \=x^r z. 
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that G’ 


Two cases can occur. 

Case G' w\. Since IC(q) key(G') W 2 (or otherwise K.(q) \= z — ► u> 2 , a contradiction), we have 
u>2 & G l+ ’ q , hence G' -w W2. Since K,(q) (= x — > w 2, it follows by Lemma} 

Case G' & z. Since IC(q \ {G'}) |= x —> z, we have that G' x by Lemma 
Thus, at this part of the proof, we have G' x. We now distinguish two cases. 

x. From K{q) |= z —> key(G') and K,(q) \= x —> y, we have IC(q) \= z 


22 


Case JC(q) h key(G') 
contradiction. 

Case IC(q) key(G') —> x. From JC(q) |= y —> x and G' x. it follows from Lemma 

9 

'—> 
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that G’ -w 


y, a 


y, which 


implies G' Fq. Since Fj belongs to an initial strong component of q’s attack graph and since the attack 
graph of q contains no strong cycle, the attack G' *2+ Fq must be weak, so K.(q) \= key(G') y. Since 
IC(q) \= z —> key(G'). we obtain IC(q) \= z —¥ y, a contradiction. 


We conclude by contradiction that IC(q) \ = z y. 


□ 


The proof of Lemma[l4]is given next. 
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Figure 4: Markov graph of the query in Example [2l] 


Proof of Lemma [14] By repeated application of Lemma [3] the initial strong component with two or more atoms 
will contain two atoms F 0 , G such that F 0 -w G F 0 . 


Let {wo} = key(Fo) (and thus C q (wo) 7 ^ 0) and {y} = key (G). Since the attack graph of q contains no strong 
cycle, we have IC(q ) |= wq —> y and KL(q) \= y —> wq. By Lemma |26| there exists w± £ vars (q) such that 
wq -^7 w\, Q q {w\) 7 ^ 0, and K,{q) |= w\ —> y. The latter implies that K.(q) |= w\ —> wq as well. 


By repeated application of Lemma 

C 


- q {wi) 7 ^ 0 for every i £ 
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for every k > 0, there exists a Markov path wo -2-7 w\ ■ ■ ■ -^7 Wk, where 
{0,..., k}, and K,{q) \= Wk —> wq- Since vars(q) is a finite set, at some point we will 
have Wk = w, for some i with i < k, at which point we have found the desired Markov cycle. □ 


The proof of Lemma [14] actually shows a slightly stronger result than the statement of Lemma 14 The proof 


shows that whenever R(x, z) belongs to an attack cycle of size 2 that is part of an initial strong component of the 
attack graph, then the Markov graph contains a directed path from a; to a Markov cycle with the desired properties. 
This is illustrated by the following example. 


Example 21 Let q = {Ri(x,ui), R 2 {ui,u 2 )^ R 3 {u 2 ,u 3 ), Ri(u 3 ,y), R$(y,u\), S c (u 2 ,y,x)}. In the attack 
graph of q , every / 1 ',-atom attacks every other atom of q, and all these attacks are weak. 


The Markov graph of q is shown in Ligure[4] As predicted by the proof of Lemma 14 for every variable among 
x, y. u-|, u 2 , U 3 , there is a path that starts from the variable and ends in a Markov cycle. Notice, however, that x 
itself is not part of a Markov cycle. < 


C.6 Proof of Lemma [16] 

Proof of Lemma [16] Construct a maximal sequence 


db 0 , gi, dbi , g 2 . db 2 ,..., g„, db n 


(15) 


such that dbo = db and for every t £ { 1 ...., n}, 

1 . g, is a gblock of dbj_! such that some repair of g, is not grelevant for q in db,_i; 


2. dbj = dbj_i \ g, . 


Clearly, db„ is gpurified relative to < 7 , and by repeated application of Lemma 15 every repair of db satisfies q if 
and only if every repair of db n satisfies q. 


It remains to be shown that db„ can be computed in polynomial time. Clearly, the above sequence (15 1 satisfies 
n < jdb|. The condition[T]can be tested in polynomial time, as argued in the sequel of this proof. 


Lirst, every uncertain database that is purified relative to q has at most polynomially many gblocks, and every 
gblock has at most polynomially many repairs. Lurther, for any repair s of some gblock g,, the following are 
equivalent: 


1 . s is grelevant for q in db j_!; 

2. there exists a repair r of db such that s C r and for some valuation 6 over vars(g) and some fact Acs, 
A £ 9(q) C r; and 
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3. fclb,_ i \ db s ) U s |= q, where db s is the subset of db that contains all facts whose relation name occurs 
in s. 

The first two items are equivalent by definition. Equivalence of the last two items follows from the observation 
that if some atom A £ s is relevant for q in r, then every atom of s must be relevant for q in r. The latter test is 
obviously in polynomial time. □ 
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